2002 XML News

Tuesday, December 31, 2002

The Free Software Foundation is calling for the W3C to make their patent policy GPL compatible, which it apparently still is not because it allows patent holders to limit their royalty-free licenses to spec implementations only. Comments to the W3C are due by midnight today, but in practice the W3C often listens to good comments past the official date, so don't hesitate to contribute just because you're not reading this unil the New Year.

Sunday, December 29, 2002

Eddie Robertsson has written an open source implementation of Schematron 1.5 in Java based on JAXP's TrAX. The distribution also supports embedded Schematron in RELAX NG schemas, using James Clark's Jing package.

Saturday, December 28, 2002

Morphon has released version 1.3.3 of the Morphon CSS-Editor, a $39 payware Cascading Stylesheet editor written in Java that supports CSS1, CSS2, SVG, and the CSS Mobile Profile.

Morphon has also posted the third beta of the Morphon XML Editor 3.0. This editor provides WYSIWYG, source, and tree views, supports Unicode 3.0, and can print preview and spell check XML documents. The Morphon XML Editor is $150 payware.

Sunday, December 22, 2002

I'll be visiting family for the Christmas week. I should have Internet access, but updates will probably be a little slow here until next weekend. Merry Christmas, Joyeuese Noël, Happy Hanukkah, a Festive Kwanzaa, Season's Greeting's, and Happy New Year to you all.


As part of my continuing end-of-the-year InBox purging (Only 1400 messages to go!), yesterday I went through all the mail announcing conferences and updated the XML Conferences page. So far, I know that next year I'll speaking at OOP 2003 in Munich in January, XML & Web Services 2003 in London in March, Software Development 2003 West in Santa Clara in March, and Software Development 2003 East in Boston in September.


Benoît Marchal's HC (short for Handler Compiler) "takes some drudgery out of event-based XML parsing by automatically generating the SAX ContentHandler for a list of XPaths."


Lucid'i.t. has released the Lucid XML Toolkit 1.0. After peeling away several layers of press-release hype, this appears to just be another schema validating XML parser written in Java that supports the usual batch of API acronyms: SAX1, SAX2, DOM Level 1 and 2, JAXP, and so forth. So far only the free-beer personal edition is available.


Adaptinet has released XML-Serializer, an XML-To-Java data binding tool that supports DTDs and XML schemas (but apparently not simple, well-formed XML). XML-Serializer costs $49.95.


Roger L. Costello and David B. Jacobs have begun work on the collaborative development of a distributed, decentralized Web service registry. He seems to be thoroughly annoying the ebXML community that wants to have a centralized monopoly on web services registries. "The purpose of this effort is to develop a concrete, implementable architecture for a highly distributed registry. The notion is that each Web service defines their own registry - comprised of the collection of documents that describes the service."

Saturday, December 21, 2002

As part of my continuing end-of-the-year InBox purging (Only 1500 messages to go!), yesterday I went through all the mail announcing conferences and updated the Java Conferences page. So far, I know that next year I'll speaking at OOP 2003 in Munich in January, XML & Web Services 2003 in London in March, Software Development 2003 West in Santa Clara in March, and Software Development 2003 East in Boston in September.


The W3C XML Linking Working Group has posted a new working draft of the XPointer xpointer() Scheme. This draft cleans up the specification a lot, but does not appear to make any significant changes to the language syntax or semantics.

The working group's charter expires at the end of the year, and there seems to be little will in the W3C for continuing with this. Since XPointer hasn't even gotten to last call working draft yet, it seems unlikely that this will ever be finished. This may be a good thing. The full syntax just seems too ugly and verbose to achieve widespread adoption. Even within the W3C, there's signifcant dissent. Outside the W3C, reactions to XPointer range from open hostility to apathy, but nobody seems to actually like it.


The W3C XHTML working group has published a new working draft of XHTML 2.0, just one week after the last working draft. According to the abstract, "It was released very soon after the second public Working Draft because of production errors, and does not reflect any major changes from that draft." XHTML 2.0 is the next, backwards incompatible version of HTML that incorporates XFrames, XForms, and lots of other crunchy XML goodness. However, XLink is not yet included and may never be. (The HTML Working Group are extreme XLink skeptics.)


The W3C Math Working Group has published the first working draft of the second edition of the MathML 2.0 specification. Like the second edition of XML 1.0, this focuses on incorporating errata and cleaning up the language of the spec, rather than on introducing new features or changing existing ones. Among other improvements, this draft now includes a schema for MathML 2.0 as well as a DTD. The MathML 2.0 described here should be completely identical to the one described by the first edition.


The W3C Quality Assurance (QA) Activity has posted the first public working draft of the QA Framework: Test Guidelines. "This document defines a set of common guidelines for conformance test materials for W3C specifications."


The W3C Web Services Internationalization Task Force has published the first public working draft of Web Services Internationalization Usage Scenarios. This describes various issues that arise when using SOAP services in multi-language environments. For example, is it possible to send error messages in both English and Japanese?


And speaking of Japanese, the Center for Global Communications, International University of Japan (GLOCOM), Infoteria Corporation, and Media Fusion Co., Ltd have submitted a note to the W3C describing ongoing work within Japan on Embedding Glyph Identifiers in XML Documents. This describes a means to specify exactly which glyph should be used for a single Unicode character, something which is apparently a lot more significant in ideographic languages than alphabetic ones. Glyph selection would use a special glyph:name attribute that references a glyph by its position in the SO/IEC 10036:1996, Information Technology -- Font information interchange -- Procedures for registration of font-related identifiers standard. This note will not be put on the W3C standards track. Instead, it's being moved through the Japanese Standards Association (JSA).

Friday, December 20, 2002

The W3C XML Protocol Working Group has released the candidate recommendations of the three SOAP 1.2 specifications:

The namespace URIs are now http://www.w3.org/2002/12/soap-envelope and http://www.w3.org/2002/12/soap-encoding. Otherwise, it's not obvious to me what the changes are in this draft. Since, I think there are multiple fairly deep and fundamental flaws in SOAP, and they don't seem likely to be fixed, I haven't been following this work too closely. Comments are due by January 24.


Robert C. Lyons has released the Turing Machine Markup Language (TMML). This is an XML application for describing Turing machines. He also rote a TMML interpreter in XSLT that executes the Turing machine described in the TMML source document. This is another proof by construction that XSLT 1.0 is Turing-complete.


John Cowan's posted TagSoup 0.8, a "parser written in Java that, instead of parsing well-formed or valid XML, parses HTML as it is found in the wild: nasty and brutish, though quite often far from short. TagSoup is designed for people who have to process this stuff using some semblance of a rational application design. By providing a SAX interface, it allows standard XML tools to be applied to even the worst HTML."

Thursday, December 19, 2002

The W3C XML Core Working Group has posted the candidate recommendation of Namespaces in XML 1.1. Their don't appear to be any functional changes since the last call working draft. Changes since 1.0 are:

  • IRI references are used instead of URI references
  • Namespace prefixes can be undeclared
  • The prefix xmlns is by definition bound to the namespace name http://www.w3.org/2000/xmlns/.

Mark Hale's released version 0.881 of JSci, a Java class library containing many useful mathematical and scientific functions such as complex arithmetic. The major new feature in this release is that all the periodic table classes have been replaced by XML documents (packaged in the jar file). "Element objects can be instantiated using the new factory method PeriodicTable.getElement(). This makes the data easier to update and more accessible for processing. As a consequence, a periodic table reference in HTML is now included with the documentation - generated by applying an xsl stylesheet to the xml files."


The Mozilla Project has posted the first alpha of Mozilla 1.3. This release focuses on mail. Message Views is a new feature "which will help users locate, organize and prioritize their mail messages. A View is used to filter and display only those messages matching a given set of criteria. Mozilla ships with a set of pre-defined Views, but users can also create their own."

Mail filter actions have also been improved, as has compatibility with some mail servers. The biggvest new feature is junk mail classification so you can train your client to distinguish between good mail and junk mail. I've recently installed SpamAssassin for this purpose and it seems to be catching about 60% of the spam I receive, including almost all of the most egregious examples. So far I haven't seen any false positives, and I'm going to try turning down the number of hits required to identify e-mail as spam.

Finally, Mac OS 9 support has been removed from this release. Third parties may release their own OS 9 builds.


IBM's alphaWorks has released version 3.3.1 of their Web Services Toolkit. The "basic software components needed to create a Web services environment are provided with Web Services Tool Kit. Included is an architectural blueprint (Web Services Architecture), sample programs, Utility services, and some tools that are helpful in developing and deploying Web services. Extensive documentation is included to assist developers with the basic concepts of Web services. The tool kit also includes a fully-functioning Web services client API that can be used to directly access a UDDI registry." New features in 3.3.1 adds WS-Policy and WSRP. Java 1.3 or later is required.

Wednesday, December 18, 2002

Opera Software has posted the second public beta of Opera 7.0 for Windows, a $39 payware web browser that supports XML and CSS. Beta 2 adds some user interface improvements including fast forward access to the pages you will want to visit next, a one-click log-in password manager; a links panel that displays all links in the current page, one-click skin install, and multiple user style sheets.


The World Wide Web Consortium (W3C) User Agent Accessibility Guidelines Working Group (UAWG) has released the final recommendation of User Agent Accessibility Guidelines 1.0. The abstract states, "This document provides guidelines for designing user agents that lower barriers to Web accessibility for people with disabilities (visual, hearing, physical, cognitive, and neurological). User agents include HTML browsers and other types of software that retrieve and render Web content. A user agent that conforms to these guidelines will promote accessibility through its own user interface and through other internal facilities, including its ability to communicate with other technologies (especially assistive technologies). Furthermore, all users, not just users with disabilities, should find conforming user agents to be more usable."

The UAWG has also puublished a complementary note covering Techniques for User Agent Accessibility Guidelines 1.0. "These techniques address key aspects of the accessibility of user interfaces, content rendering, application programming interfaces (APIs), and languages such as the Hypertext Markup Language (HTML), Cascading Style Sheets (CSS) and the Synchronized Multimedia Integration Language (SMIL)."


The W3C Device Independence Working Group has posted the first public Working Draft of Delivery Context Overview for Device Independence. "Delivery context information is typically used to provide an appropriate format, styling or other aspect of some web content that will make it suitable for the capabilities of a presentation device. The selection or adaptation required to achieve this may be performed by an origin server, by an intermediary in the delivery path, or by a user agent."


Andrzej Jan Taramina has posted a of GPSml, a "standard, comprehensive and functional markup language that can express and encode the full gamut of data generated by GPS (Global Position System) devices, including real time position information and collections of points (waypoints, routes, tracks, etc.)." The latest version of his GPL'd Java GPS Access Library generates GPSml output.

Tuesday, December 17, 2002

Kohsuke Kawaguchi has released JARV, a vendor-neutral, implementation-independent and schema language independent Java API interface for XML validation. With the right validation engines installed, it can support RELAX NG, TREX, the W3C XML Schema Language, and DTDs.. JARV is open source under an MIT license.


Alex Chaffee has updated his XPath Explorer. This is an open source, graphical tool for displaying XML document as a tree and evaluating XPath expressions with respect to those documents. to add an Expand/Collapse All Nodes menu item and an Open Location menu item. This release is also faster when opening large files and uses a progress bar when loading or expanding so it shouldn't appear hung just because an operation takes a while.


Syncro Soft has released version 1.2.4 of <oXygen/>, a $65 payware XML editor written in Java that can run as an applet. <oXygen/> 1.2 supports XSLT and XSL-FO, among other features. Version 1.2.4 adds about a dozen small new features including document templates, auto-completion of end-tags, and smart indenting.


RenderX has released a COM wrapper for the XEP XSL Formatting Objects to PDF converter. XEP is written in Java. The converter uses the Java Native Interface (JNI) to translate COM methods to Java virtual machine calls. Java 1.2.2 or later is required. This component only works with the $999.95 Developer and $4999.95 Server versions of XEP (and crippled variations thereof), not the cheaper client version.


Eric van der Vlist, the author of the O'Reilly XML Schema book, has begun writing a book about Relax NG that will be published both on paper and online under the GNU Free Documentation License.

Monday, December 16, 2002

W. Eliot Kimber has founded the EXSLFO project as "a community effort to define functional extensions to the XSL Formatting Objects specification in advance of development of new versions of the XSL FO specification by trhe W3C. It is intended to be an adjunct to the formal W3C specification development process. It is modeled on the existing EXSLT activity (http://exlst.org/)." Possible extensions might include PDF bookmarks and metadata, capturing of page-to-object mappings, non-rectangular pages, etc. All interested parties, vendors and users, are invited to participate.


Danny Vint's published two Quick Reference Cards for XML Schemas. The PDF files are setup for 11"x 17" paper, but can be shrunk to smaller page sizes.


eSVG 1.4 is an implementation of the subsets of SVG 1.1 and SVG Mobile specifications designed for integration into embedded systems. eSVG project additionally provides multithreaded eSVG scripting according to SVG DOM 2 interface specification. eSVG scripting is based on SpiderMonkey (JavaScript-C) Engine and ORMIDE. Version 1.4 supports the most of SVG Tiny profile features, SVG Basic profile features and SVG DOM interface entries. eSVG currently runs on Windows 98/NT/2000/ME/XP, Windows CE, and UniOP MMI. Pricing is deliberately hidden so it can't be good.


Roger L. Costello of The MITRE Corporation has published a W3C XSD Schema containing simpleType definitions that enumerates various units of temperature, length, weight, volume, area and so forth.


Vasil I. Yaroshevich has released xsltml 2.0, an XSLT MathML Library that provides a set of XSLT templates for MathML 2.0 to LaTeX translation implemented in pure XSLT. No extension functions are used.


The Simple XML Data Manipulation Language (SiXDML) is a SQL like (query/update/insert/delete) language for working with XML documents. There is currently an implementation in Java for the Xindice native XML database.


David Rosenborg has released three utilities for working with the RELAX NG schemas written in the Compact Syntax :

  • An Emacs mode for editing in the compact syntax.
  • RngToRnc.xsl: an XSLT stylesheet that transforms from the XML syntax to the compact syntax.
  • RncReader: a SAX2 parser that reads a compact syntax schema and sends events to the application as if it was reading a schema in the XML syntax.

All three utilities are available under a BSD license


Fraunhofer IPSI has released IPSI-XQ 1.2.2, a prototype XQuery processor written in Java. Pricing has not yet been announced.


eXchaNGeR 0.9 is an open source XML-Browser/Editor. By default, it uses a tree view, but can be configured with special viewers and editors for different XML applications. eXchaNGeR is written in Java and published under the Mozilla Public License.

Sunday, December 15, 2002

I'm doing my end-of-year mailbox cleansing over the next couple of weeks. As I try to reduce my inbox from almost 4000 messages down to something more manageable, those of you who've written to me in the last year or two may be getting some belated replies. So far I've reached July of 2001, and I'm down to 3200 messages. :-)


The W3C XML Schema Working Group is soliciting comments on version 1.1 of the W3C XML Schema language. 1.1 "is intended to be mostly compatible with XML Schema 1.0 and to have approximately the same scope, but also to fix bugs and make whatever improvements we can, consistent with the constraints on scope and compatibility." Comments should be directed to www-xml-schema-comments@w3.org.


Anders Møller at the BRICS research center of the University of Aarhus has released Document Structure Description 2.0 (DSD2), yet another schema language for XML. This one is based on boolean logic and regular exxpressions, and does not support typing. It reminds me a little of Schematron, though it's not based on XPath. It does allow the sort of context-sensitive constraints that Schematron allows.

Saturday, December 14, 2002

I'm doing my end-of-year mailbox cleansing over the next couple of weeks. As I try to reduce my inbox from almost 4000 messages down to something more manageable, those of you who'vw written to me in thelast year or two may be getting some belated replies. :-)


Anthony B. Coates has released, mtxslt, an Ant task that allows multiple different XSLT engines to be used during the same build.


Henry S. Thompson has released a new version of his XSV W3C XML Schema Language Validator. This release features "improved conformance by computing and using values where required, e.g. for enumeration checks and 'fixed' element/attributes, and some support for date, time and dateTime." XSV can be run as a web form or a Windows executable.


Microsoft has released version 1.0 of the Microsoft XSD Inference utility, a tool for creating a W3C XML Schema Language (XSD) schema from an XML instance document.

Friday, December 13, 2002

The W3C XHTML working group has published a new working draft of XHTML 2.0, the next, backwards incompatible version of HTML that incorporates XFrames, XForms, and lots of other crunchy XML goodness. However, XLink is not yet included and may never be. (The HTML Working Group are extreme XLink skeptics.)


The W3C has released version 7.1 of Amaya, their open source, test bed web browser and editor for Windows and Linux that supports HTML 4.01, XHTML 1.0, XHTML Basic, XHTML 1.1, HTTP 1.1, and CSS, as well as providing partial support for MathML 2.0 and SVG. This is a bug fix release.


James Clark's Trang translates schemas written in RELAX NG into different formats. In particular, it can

  • Translate a RELAX NG schema in the compact syntax into the XML syntax
  • Translate a RELAX NG schema in either the XML or compact syntax into a DTD

Clark says, "Trang aims to produce human-understandable schemas; it tries for a translation that preserves all aspects of the input schema that may be significant to a human reader, including the definitions, the way the schema is divided into files, annotations and comments." Trang is written in Java and published under a very open license.

Thursday, December 12, 2002

Christian Neumann's posted LibXMLight 0.1.1, "a non-validating, lightweight XML Parser Library written in C++. The API is similar to SAX." Version 0.1.1 adds API documentation. LibXMLight is published under the GPL.


Daniel Veillard's released version 2.4.30 of libxml2, the GNOME XML parser for Linux. Version 2.4.30 restores the Python support accidentally broken in 2.4.29.


Sun's posted the proposed final draft (version 0.9.0) of the Java Architecture for XML Binding 1.0 (JAXB) on the Java Developer Connection (registration required). This includes a spec, API docs, and a reference implementation. JAXB compiles an XML schema into one or more Java classes. (First mistake: JAXB assume there's a schema. Second mistake: It assumes the schema is written in the W3C XML Schema Language. Third mistake: It assumes documents actually adhere to the schema.) JAXB can unmarshal schema-valid XML into Java objects; read, update and validate the Java objects against the schema, and write the result back out as XML.

Wednesday, December 11, 2002

Netscape has released version 7.0.1 of their namesake web browser. Netscape 7.0.1 is based on Mozilla 1.0.2 and supports XML, HTML, XHTML, CSS, XSLT, RDF, DOM, and assorted other cool acronyms. The big new feature in this release is pop-up blocking (though Mozilla users have had this power for a while).


Devsphere has released XML Tag Library, an open source Java Server Page (JSP) tag library for processing XML. It complements the JSP Standard Tag Library by adding SAX parsing and DSOM serialization.

Tuesday, December 10, 2002

The World Wide Web Consortium (W3C) XML Encryption Working Group has released the final recommendations of XML Encryption Syntax and Processing and Decryption Transform for XML Signature as Recommendations. XML Encryption is a syntax for encrypting documents, elements, or other data and embedding or pointing to the encrypted text in XML documents using Base-64 encoding. A variety of algorithms are supported. The XML Signature decryption transform "enables XML Signature applications to distinguish between those XML Encryption structures that were encrypted before signing (and must not be decrypted) and those that were encrypted after signing (and must be decrypted) for the signature to validate."


The W3C HTML Working Group has released the Last Call Working Draft of Modularization of XHTML in XML Schema. This spec provides a complete set of W3C XML Schema Language modules for XHTML, and allows document authors to modify and extend XHTML to build new, non-strictly conmforming XHTML documents. Comments are due by January 31, 2003.


The W3C Amaya browser team has opened a contest to design the new Amaya Welcome page (the default page that the Amaya browser displays when launched). Entries must be valid, accessible, and showcase multiple Amaya capabilities. The winner gets bragging rights. Submissions are due by February 3, 2003.

Monday, December 9, 2002

I'm continuing to experiment with XHTML 1.1 for this page. In particular, I'm using the internal DTD subset to extend and replace some of the normal content models and attribute lists. You can check out your browser's support for that here. So far, here's my score card:

  • Mozilla 1.2.1 Mac: A+, no problems at all
  • Mozilla 1.1, Windows 2000: A+, no problems at all
  • Mozilla 1.0.1 Linux: A+, no problems at all
  • Internet Explorer 5.1.6 Mac: F, completely unable to display it. Downloads it to disk instead. (Note: This is the most current version of the most common browser for the Macintosh.) Also fails to render the CSS background colors properly.
  • Internet Explorer 6.0 SP 1 Windows 2000: F, completely unable to display it. Downloads it to disk instead. (Note: This is the most current version of the most common browser for Windows.) Also fails to render the CSS background colors properly.
  • Internet Explorer 5.1.3 Mac: F, completely unable to display it. Downloads it to disk instead.
  • Internet Explorer 4.0.1 Mac: F, completely unable to display it. Downloads it to disk instead.
  • Internet Explorer 5.00.2920.0000 Windows 2000: F, completely unable to display it. Downloads it to disk instead.
  • Netscape Communicator 4.75, Mac: F, completely unable to display it. Downloads it to disk instead.
  • Netscape Communicator 4.04, Mac: F, completely unable to display it. Downloads it to disk instead.
  • Netscape Communicator 3.0.1, Mac: F, completely unable to display it. Downloads it to disk instead.
  • Netscape Communicator 6.1, Mac: A, displayed with limited redraw problems
  • Netscape Communicator 6.2.1, Mac: A, displayed with limited redraw problems
  • Opera 5.0, Mac: D, displayed as plain text at user option
  • Opera 6.0 beta 3, Mac: A+, displayed correctly
  • Konqueror 3.0.3, Linux: F, completely unable to display it. Downloads it to disk instead.

The big question mark seems to be the MIME type I serve the page with. If I serve it with text/html everything pretty much works, except that all the browsers display the "]>" at the end of the internal DTD subset. If I serve it as application/xhtml+xml, then Mozilla derived browsers and Opera 6 and later work perfectly, but everything else works not at all. Given this, I really can't see using XHTML for web sites for several years at least.

What I'm doing turns out not to be strictly conforming XHTML (redefining parameter entity references in the internal DTD subset and adding new elements in the XHTML namespace are big no-no's) but the browsers that support XHTML don't seem to have any trouble with this. It is possible to create valid documents based on XHTML without going through the hassle of creating a new profile using the modularization framework. I'm not sure there's any point to doing this, though.


Speaking of Opera, Opera Software has posted the third beta of Opera 6.0 for MacOS and Mac OS X. This release adds shared library support and enables Java in the classic MacOS, but offers no major new features in the XML space. It still supports direct display of XML with CSS stylesheets. XSL is still missing in action. Opera is normally $39 payware or free-beer adware, but right now, there's a sale so you can buy it for $29.

Sunday, December 8, 2002

The Mozilla Project has released version 0.5 of Phoenix, a light-weight browser for Windows and Linux based on Mozilla's Gecko engine. It supports all the yummy XML features, but doesn't include the e-mail program, news reader, or nose hair trimmer. Phoenix differs from similar efforts like Galeon in that it's based on XUL and is designed for cross-platform release on Linux and Windows. (Mac OS X users should check out Chimera instead.) This is mostly a bug-fix and speed-up release.


Christian Neumann's posted LibXMLight 0.1.0, "a non-validating, lightweight XML Parser Library written in C++. The API is similar to SAX." LibXMLight is published under the GPL.

Saturday, December 7, 2002

The W3C Evaluation and Repair Tools Working Group has posted the first public working draft of Evaluation and Report Language (EARL) 1.0. EARL is:

a language to express test results. Test results include bug reports, test suite evaluations, and conformance claims. The test subject might be a Web site, an authoring tool, a user agent or some other entity. Thus, EARL is flexible. It enables any person, entity, or organization to state test results for any thing tested against any set of criteria.

Stating test results in EARL creates a variety of opportunities. The data can be--

  • exchanged between tools;
  • used to create reports;
  • combined to compare how different test subjects fared on the same test.

Earl is based on RDF.


The Apache XML Project has released version 2.0.4 of the Cocoon application server. "Apache Cocoon is an XML framework that raises the usage of XML and XSLT technologies for server applications to a new level. Designed for performance and scalability around pipelined SAX processing, Cocoon offers a flexible environment based on the separation of concerns between content, logic and style. A centralized configuration system and sophisticated caching top this all off and help you to create, deploy and maintain rock-solid XML server applications." Version 2.0.4 "is a maintainance release focusing on improved performance and robustness. In addition some bugs were fixed and new features were added." New features include:

  • The HTMLGenerator now accepts a JTidy configuration file for fine-grained control on the generated document.
  • New Logicsheet for use with InputModules.
  • Improved support for CLOB and BLOB columns in modular database actions.
  • New ZipArchiveSerializer to build zip files aggregating various sources as archive entries.
  • CocoonServlet upload behavior now configurable from the web.xml.
  • XScript now has better variable management: variables of request, session, global, and page scope are stored not in the XScriptManager, but as request, session, context attributes, or as XSP page field (respectively).

Andy Clark has posted a new release of his CyberNeko Tools for the Xerces Native Interface (NekoXNI). This release fixes a few bugs in the HTML parser.

Friday, December 6, 2002

IBM's alphaWorks has released version 1.2 of the XML Wrapper Generator, a graphical tool that integrates XML data sources into a DB2 database. The tool loads XML schema files, "shreds" them to a relational schema, and generates appropriate NICKNAME and VIEW statements. This is a bug fix release.

Thursday, December 5, 2002

I've fleshed out the XOM design principles document. It now explains the reasoning behind the various choices made in XOM.


Malcolm Wallace and Colin Runciman have released HaXml 1.08, and XML processing library for the Haskell language. According to the web page,

HaXml is a collection of utilities for using Haskell and XML together. Its basic facilities include:
  • a parser for XML,
  • a separate error-correcting parser for HTML,
  • an XML validator,
  • pretty-printers for XML and HTML.

For processing XML documents, the following components are provided:

  • Combinators is a combinator library for generic XML document processing, including transformation, editing, and generation.
  • Haskell2Xml is a replacement class for Haskell's Show/Read classes: it allows you to read and write ordinary Haskell data as XML documents. The DrIFT tool (available from http://repetae.net/~john/computer/haskell/DrIFT/) can automatically derive this class for you.
  • DtdToHaskell is a tool for translating any valid XML DTD into equivalent Haskell types.
  • In conjunction with the Xml2Haskell class framework, this allows you to generate, edit, and transform documents as normal typed values in programs, and to read and write them as human-readable XML documents.
  • Finally, Xtract is a grep-like tool for XML documents, loosely based on the XPath and XQL query languages. It can be used either from the command-line, or within your own code as part of the library.

HaXml is distributed under the Artistic License.


Alex Chaffee has updated his XPath Explorer. to add "pop-down history lists, sample files menu, snazzier widget alignment, hierarchical tree document display," an expanded XPath field, and NetBeans and Eclipse plug-ins. This is an open source, graphical tool for displaying XML document as a tree and evaluating XPath expressions with respect to those documents.

This is a nice little toy for simple documents, but when I tried to use it in the Hands-On XSLT class at Software Development East last month it proved ungodly slow for real world documents like the periodic table example from the XML Bible and one DocBook chapter from the Processing XML with Java source.

Wednesday, December 4, 2002

The W3C Multimodal Interaction Working Group have published a note on the Multimodal Interaction Framework. This note is a very high-level description of how different inputs such as speech, handwriting, keyboards, and so forth can be connected up with different outputs such as audio, video, and screens within the same sytem. The goal is to allow content and processing to be decoupled from the specific input and output methods.


The W3C has released version 7.0 of Amaya, their open source, test bed web browser and editor for Windows and Linux that supports XML, XHTML, and CSS, as well as providing partial support for MathML and SVG. New features in this release include a history menu, the Raptor RDF parser, better support for XML, SVG, and CSS, an OpenGL version with support for SVG opacity and PNG transparency, and font anti-aliasing under Unix.


IBM's alphaWorks has released version 3.3 of their Web Services Toolkit. The "basic software components needed to create a Web services environment are provided with Web Services Tool Kit. Included is an architectural blueprint (Web Services Architecture), sample programs, Utility services, and some tools that are helpful in developing and deploying Web services. Extensive documentation is included to assist developers with the basic concepts of Web services. The tool kit also includes a fully-functioning Web services client API that can be used to directly access a UDDI registry." New features in 3.3 include the "Tivoli Management Web Services and Common Event Format, Federated Identity demo, Wide Spectrum Stress Tool, Reputation Protocol, WS-Inspection crawler utility, Pluggable Discovery Framework, Privacy Authorization Director, and Updated Utility Services." Java 1.3 or later is required.

Tuesday, December 3, 2002

The Mozilla Project has released Mozilla 1.2.1 in order to fix a "DHTML bug in Mozilla 1.2 which broke dynamically writing into a dynamically created element." There are no new features in this release. All 1.2 users should upgrade.


IBM's alphaWorks has released XincaML (eXtensible Inter-Nodes Constraint Mark-up Language). XincaML is a schema language that "can describe the presence or value dependencies amongst nodes located on different branches of an XML tree. It can specify constraints that can't be expressed by the XML schema language and therefore supplement the existing XML schema language to insure the integrity of data. XincaML is a declarative language and its syntax is based on XML format." I have looked at the language yet, but the general description of what it can do reminds me of Schematron. The XincaML validator is written in Java.


The W3C Voice Browser Working Group has posted the last call working draft of the Speech Synthesis Markup Language Specification. According to the abstract:

The Voice Browser Working Group has sought to develop standards to enable access to the Web using spoken interaction. The Speech Synthesis Markup Language Specification is part of this set of new markup specifications for voice browsers, and is designed to provide a rich, XML-based markup language for assisting the generation of synthetic speech in Web and other applications. The essential role of the markup language is to provide authors of synthesizable content a standard way to control aspects of speech such as pronunciation, volume, pitch, rate, etc. across different synthesis-capable platforms.
Monday, December 2, 2002

Michael Fitzgerald's written a nice introductory article about XOM for XML.com.


Daniel Veillard's released version 1.0.23 of libxslt, the GNOME XSLT library and version 2.4.28 of libxml2, the GNOME XML parser for Linux. The new version of libxslt fixes bugs and cleans up the code and docs. The new version of libxml fixes a few bugs and adds support for alternate encodings when processing XIncludes with parse="text".


Andy Clark has posted a new release of his CyberNeko Tools for the Xerces Native Interface (NekoXNI). This is a collection of XML tools written specifically to take advantage of the XNI API in Xerces2 including the NekoHTML parser and the NekoDTD parser. This release makes a number of improvements and experiments in the HTML parser and corrects the naming conventions in the GeneralEntityEvent class.


Version 1.3.2 of the OpenJade DSSSL processor has been released. DSSSL is a style language for SGML and XML documents. OpenJade contains backends for various formats (RTF, HTML, TeX, MIF, SGML2SGML, and FOT). This is a maintenance release that "supports and is intended to be used with the latest version of OpenSP, currently 1.5. This means that openjade takes advantage of the features available in OpenSP 1.5. It also means that distributors can provide separate and independent 'packages' for OpenSP and OpenJade.".

Wednesday, November 27, 2002

Weather permitting (and it may not) I'll be visiting family over the Thanksgiving break. Updates will probably be a little slow until next week.


The Mozilla Project has released Mozilla 1.2, the open source web browser for Windows, Mac, and Linux that natively supports XML, CSS, XSLT, XUL and lots of other cool acronyms. The big new XML feature in this release is a pretty printed source view for raw, unstyled XML, much like that used in internet Explorer. Other new features in 1.2 include:

  • Type Ahead Find, a new feature that allows quick navigation when you type a succession of characters in the browser, matching the text in one or more links on the page. To give it a spin just go to a web page, start typing, watch the typed characters highlight as they find a match in a link and hit enter to load the selected link. You can also use it to search for any text on the page by typing / before your search text.

  • You can now show toolbars as text/icons/both in the default Classic theme. There are also a few other usability improvements such as image selection visualization (image highlights with system selection color when selected) and the removal of the confusing toolbar grippies.

  • Improvements to native look and feel in both the browser interface and the browser content area. Mozilla now supports most native GTK themes in Mozilla and the native look and feel for web content like form controls under Windows XP.

  • You can launch the browser with a group of bookmarks as your start page. This loads several pages into tabs at startup.

  • Keyboard access is improved with additional accesskeys for menus, other UI elements and page elements.

  • Document prefetching based on hints included in the page's link elements

  • Java compatibility with Mac OS 10.2 (Jaguar) has been repaired.


The W3C Quality Assurance (QA) Activity has updated three working draft specifications on quality assurance:

These describe "a common framework for enhancing the quality practices of the W3C Working Groups in the areas of specification editing, production of test materials, and coordination efforts with internal and external groups."


The W3C Device Independence Working Group has published a second last call working draft of Composite Capability/Preference Profiles (CC/PP): Structure and Vocabularies. This is an RDF vocabulary for describing user agent (browser) and proxy capabilities and preferences. Topics include:

  • The structure of client capability and preference descriptions
  • The structure of proxy behavior description
  • The use of RDF classes to distinguish different elements of a profile, so that a schema-aware RDF processor can handle CC/PP profiles embedded in other XML document types.

The CC/PP vocabulary uses URIs to refer to specific capabilities and preferences. It covers:

  • The types of values to which CC/PP attributes may refer
  • How to introduce new vocabularies
  • A client vocabulary covering print and display capabilities
  • A survey of existing work from which new vocabularies may be derived.

The W3C Cascading Style Sheets working group has published working drafts of two new modules for CSS Level 3:

New properties for borders since CSS2 include border-image, border-fit, border-image-transform, border-break, and box-shadow.

Changes in lists since CSS2 include:

  • display:marker has been replaced by the ::marker pseudo-element
  • It is no longer possible to make end markers.
  • The marker-offset property is obsolete.
  • The marker display type is obsoleted.
  • Markers are now aligned relative to the line box edge, rather than the border edge.
  • Markers now have margins.
  • Many new list style types as well as explicit algorithms for all list style types.
  • Error handling rules for unknown list style types were changed to be consistent with the normal parsing error handling rules.
  • A list-item predefined counter identifier
Tuesday, November 26, 2002

The W3C Web Services Architecture Working Group has updated the working draft of Web Services Architecture Requirements and published two new working drafts on Web Services Architecture and Web Services Glossary.

The Web Services Architecture "document describes the Web Service Architecture. The Web services reference architecture identifies the functional components, defines the relationships among those components, and establishes a set of constraints upon each to effect the desired properties of the overall architecture."

The Web Services Glossary "is a glossary of Web services terms intended to be used to describe the Web services architecture [WS Arch], and across the Web Services Activity."


IBM's alphaWorks has released XML Processing Plus Plus a new "typed and stream-based XML processing language" that extends Java with new XML stream APIs: XmlIn and XmlOut. XmlIn retrieves data from XML input streams, and XmlOut is inserts data into XML output streams. "XML Processing Plus Plus includes the xpppc compiler, which converts programs written in XML Processing Plus Plus syntax into standard Java byte code. The compiler supports type checking based on DTDs (document type definitions). The type checker reports semantic errors of XML manipulation against DTDs".


IBM has also updated their XML Parser for Java to version 4.1.2. This release is based on Xerces-J 2.2.0 and supports the W3C XML Schema Recommendation 1.0, SAX 1.0 and 2.0, DOM Level 1, DOM Level 2, and some experimental features of DOM Level 3 Core and Load/Save Working Drafts, JAXP 1.2, and XNI.

Monday, November 25, 2002

I've extracted out all the examples from Processing XML with Java into individual files. You can download them as a zip archive if you like. I actually wrote an XSLT stylesheet to pull all the examples out of the chapters, dump them into individual files, and then generate the index files from the titles used in the book. I used ant to automatically apply the stylesheet and zip all the examples, so this is just a single part of the book's build process. (Question: it seems that Ant is missing one key feature that make has, the ability to detect whether a file on the disk has changed since the last build, and thus whether dependent files needs to be regenerated or not. Ant seems to rebuild the entire project from scratch every time. Is there any way to avoid this?)

Several of the examples communicate with web services running on http://www.elharo.com. Unfortunately, between the time the book went to press and now, an upgrade to that server necessitated by security concerns broke a number of URLs published in the book. The services are still running, just not at quite the same URLs. I think in all cases you can access them by chnaging /fibonacci to /fibonacci/servlet. For instance, Example 3-10 and most of the examples in Chapter 5 attempt to communicate with a servlet running at http://www.elharo.com/fibonacci/XML-RPC. Instead you can connect to http://www.elharo.com/fibonacci/servlet/XML-RPC. In Example 3-11, you would change http://www.elharo.com/fibonacci/SOAP to http://www.elharo.com/fibonacci/servlet/SOAP and so forth.

I do not know why the new version of the Java Development Kit for the Cobalt Qube will not let me map the servlets to the shorter URLs. I just know that it won't. If anyone has a supposition as to how I might fix this so that the shorter URLs work again, please let me know. I've been tearing my hair out trying to fix this. This is using a special version of Tomcat 3.2.1 for Sun's Cobalt Qube. As near as I can tell it just doesn't pay any attention to the servlet mappings defined in the web.xml file like the old version did.

Sunday, November 24, 2002

The W3C Web Ontology Working Group has published three updated working drafts about the Web Ontology Language (OWL):

According to the Guide abstract,

The World Wide Web as it is currently constituted resembles a poorly mapped geography. Our insight into the documents and capabilities available are based on keyword searches, abetted by clever use of document connectivity and usage patterns. The sheer mass of this data is unmanageable without powerful tool support. In order to map this terrain more precisely, computational agents require machine-readable descriptions of the content and capabilities of web accessible resources. These descriptions must be in addition to the human-readable versions of that information.

The Web Ontology Language (OWL) is intended to provide a language that can be used to describe the classes and relations between them that are inherent in Web documents and applications.

The reference abstract further elucidates:

OWL is a semantic markup language for publishing and sharing ontologies on the World Wide Web. OWL is derived from the DAML+OIL Web Ontology Language [DAML+OIL] and builds upon the Resource Description Framework [RDF/XML Syntax].
Saturday, November 23, 2002

I've posted updated notes from my final two talks at Software Development 2002 East this past week, XML Pull Parsing and and DOM. The XML Pull Parsing Talk was particularly fun. It includes a lot of new material on Andy Clark's NekoPull that hasn't gotten a lot of notice yet. Unlike XMLPULL, NekoPull is fully conformant to the XML specification in both interface and implementation. The API is also much more sensible than XMLPULL. It doesn't make all the compromises XMLPULL makes for J2ME environments in the name of size and speed.

However, the NekoPull API is far from perfect. There are definitely some weird spots. The two most obvious are short type constants and the use of public fields instead of getter and setter methods. There are also several dependencies on the Xerces Native Interface (XNI) that prevent it from being a truly generic API for other parsers. In the long term, I have high hopes for StAX, the Streaming API for XML, being developed in the Java Community Process, mostly becomes James Clark is on the expert group. I'll be revisting and updating all of this in just about four months at Software Development West 2003 in Santa Clara. Pull parsing is an exciting space to be watching right now.

Friday, November 22, 2002

Jonathan Borden has posted a draft of RDDL 2.0. RDDL is an XHTML-based vocabulary for human readable and machine processable documents placed at the end of namespace URLs. The major change in version 2.0 is the option to use RDF instead of XLinks. For example, in the old XLink style, a rddl:resource element might look like this:

<rddl:resource xlink:href="baseball.dtd"
  xlink:role="http://www.isi.edu/in-notes/iana/assignments/media-types/application/xml-dtd"
    xlink:arcrole="http://www.rddl.org/purposes#validation">
  <div id="DTD" class="resource">
    <h3>DTD</h3>
    <p>A <a href="baseball.dtd">DTD</a> for baseball statistics</p>
  </div>
</rddl:resource>

In the new RDF style, it might look like this:

<rdf:Description rdf:ID="DTD" rddl:title="DTD">
  <purpose:validation>
    <rddl:resource rdf:about="baseball.dtd" >
      <rddl:nature 
        rdf:resource="http://www.isi.edu/in-notes/iana/assignments/media-types/application/xml-dtd"
      />
    </rddl:resource>
    <div id="DTD" class="resource">
      <h3>DTD</h3>
      <p>A <a href="baseball.dtd">DTD</a> for baseball statistics</p>
     </div>  
   </purpose:validation>
</rdf:Description>

A second, slightly less ugly RDF option looks like this:

<rddl:resource ID="DTD">
  <rddl:title>DTD</rddl:title>
  <rddl:nature 
    resource="http://www.isi.edu/in-notes/iana/assignments/media-types/application/xml-dtd"/>
  <rddl:purpose 
    resource="http://www.rddl.org/purposes#validation"/>
  <rddl:related resource="baseball.dtd"/>
  <rddl:prose>
    <div id="DTD" class="resource">
    <h3>DTD</h3>
    <p>A <a href="baseball.dtd">DTD</a> for baseball statistics</p>
   </div>
  </rddl:prose>
</rddl:resource>

The W3C RDF Core Working Group has updated six working drafts covering various aspects of the Resource Description Framework. According to the primer draft, "The Resource Description Framework (RDF) is a language for representing information about resources in the World Wide Web. It is particularly intended for representing metadata about Web resources, such as the title, author, and modification date of a Web page, copyright and licensing information about a Web document, or the availability schedule for some shared resource. However, by generalizing the concept of a 'Web resource', RDF can also be used to represent information about things that can be identified on the Web, even when they can't be directly retrieved on the Web. RDF provides a common framework for expressing this information so it can be exchanged between applications without loss of meaning." The updated specs, in roughly the order you might want to read them, are:

  • RDF Primer "is designed to provide the reader with the basic knowledge required to effectively use RDF. It introduces the basic concepts of RDF and describes its XML syntax. It describes how to define RDF vocabularies using the RDF Vocabulary Description Language, and gives an overview of some deployed RDF applications. It also describes the content and purpose of other RDF specification documents."

  • RDF Semantics "s a specification of a precise semantics for RDF and RDFS, with some entailment results. It is intended to be readable by a general technical audience."

  • Resource Description Framework (RDF): Concepts and Abstract Syntax "defines an XML syntax for the Resource Description Framework (RDF) as amended and clarified by the RDF Core Working Group from that originally described in RDF Model & Syntax. The syntax is updated to be specified in terms of XML, XML Namespaces, the XML Information Set with new support for XML Base. The parts of the RDF/XML syntax are explained along with examples of how they work. The formal grammar is annotated with actions for generating the arcs that form the RDF graph as defined in the RDF Concepts and Abstract Syntax Working Draft. This is done using the N-Triples RDF Graph serializing format which enables more precise recording of the mapping in a machine processable form. These tests are gathered and published in the RDF Test Cases Working Draft."

  • RDF Vocabulary Description Language 1.0: RDF Schema "describes how to use RDF to describe RDF vocabularies. This specification also defines a basic vocabulary for this purpose, as well as conventions that can be used by Semantic Web applications to support more sophisticated RDF vocabulary description."

  • RDF Test Cases describes a set of machine-processable test cases for RDF which are available from a separate web page.

  • RDF/XML Syntax Specification (Revised) "defines an XML syntax for the Resource Description Framework (RDF) as amended and clarified by the RDF Core Working Group from that originally described in RDF Model & Syntax. The syntax is updated to be specified in terms of XML, XML Namespaces, the XML Information Set with new support for XML Base. The parts of the RDF/XML syntax are explained along with examples of how they work. The formal grammar is annotated with actions for generating the arcs that form the RDF graph as defined in the RDF Concepts and Abstract Syntax Working Draft. This is done using the N-Triples RDF Graph serializing format which enables more precise recording of the mapping in a machine processable form."

Thursday, November 21, 2002

The W3C XForms Working Group has posted the candidate recommendation of XForms 1.0. According to the abstract,

XForms is an XML application that represents the next generation of forms for the Web. By splitting traditional XHTML forms into three parts-XForms model, instance data, and user interface-it separates presentation from content, allows reuse, gives strong typing-reducing the number of round-trips to the server, as well as offering device independence and a reduced need for scripting.

One of the editors, Micah Dubinko, will be talking about this at SD Expo here in Boston tomorrow.


I've posted updated notes for yesterday's XLinks and Schemas seminars at Software Development 2002 East.

Wednesday, November 20, 2002

Michael Kay has released Saxon 7.3, a partial and experimental implementation of XSLT 2.0 written in Java. Changes include:

  • More support for new features in XSLT 2.0 and XPath 2.0
  • Support for the latest November 15, 2002 drafts
  • Compiled stylesheets are now serializable to disk and occupy less memory
  • Nodes can be annotated with type information to support schema validation.
  • Explicit FOP integration has been removed. (I always used ant for this anyway.)

This is for experimenters only. Most users should continue to use Saxon 6.5.2.


I've posted updated notes for yesterday's half-day Hands-On XSLT class at Software Development 2002 East. This covers basic XSLT 1.0 and XPath. This was the first time I've taught this as a hands-on session. It was fun, but a full day might have been more helpful. This class will probably next be offered at Software Development 2003 West in March.

Tuesday, November 19, 2002

The W3C has released version 6.4 of Amaya, their open source, test bed web browser and editor for Windows and Linux that supports XML, XHTML, and CSS, as well as providing partial support for MathML and SVG. Improvements in this release include improved Finnish and German localizations, HTTP Location header support, and many bug fixes.

Amaya's actually becoming a pretty nice browser. However, it's still ugly as sin. I know Amaya isn't really supposed to compete with Mozilla and IE, but it would be nice if some experienced screen designer felt like donating some time to cleaning up the icons and general GUI appearance.


I've posted updated notes for yesterday's half-day XML Fundamentals class at Software Development 2002 East. This covers basic XML, well-formedness, DTDs, validity, a little CSS, and namespaces. This class will probably next be offered at XML Web Services One London 2003 and then at Software Development 2003 West, both in March.

Monday, November 18, 2002

The W3C Technical Architecture Group (TAG) has published an updated working draft of Architecture of the World Wide Web. This is still incomplete, but makes for interesting reading. According to the introduction:

The World Wide Web (or, Web) is a networked information system consisting of agents (programs acting on behalf of another person, entity, or process) that exchange information.

This document organizes Web architecture into:

  1. Identification. Agents identify objects in the system (called "resources") with Uniform Resource Identifiers (URIs), defined in [RFC2396].
  2. Representation. Agents represent resources using a nonexclusive set of data formats, separately or in combination (e.g., XHTML, CSS, PNG, XLink, RDF/XML, SMIL animation). This section also discusses technologies for building new data formats (XML, XML Namespaces).
  3. Interaction. Agents exchange representations via protocols, including HTTP [RFC2616], FTP, and SMTP1. Several of these protocols share a reliance on the Multipurpose Internet Mail Extensions (MIME) standards for the format of message bodies [RFC2045] and for Internet Media Types [RFC2046].
Sunday, November 17, 2002

The W3C XSLT and XQuery Working Groups have updated seven working drafts:

For XQuery, you should start by reading XML Query Use Cases. For XPath and XSLT 2.0, you should start with the XQuery 1.0 and XPath 2.0 Data Model assuming you're already familiar with XSLT 1.0. Otherwise, you should begin by learning XSLT 1.0.

As Murphy's Law requires, they did this just a couple of days before I have to leave for Boston to talk about exactly this at SD Expo. So much for the notes I had prepared. I guess I know what I'm reading on the plane now. From a first glance, here are the major changes in these releases:

XQuery 1.0 and XPath 2.0 Data Model

This document has been rewritten very heavily. You should probably just read it. It's much cleaner and more consistent. The key paragraph is the following:

Every value handled by the data model is a sequence of zero or more items. An item is either a node or an atomic value. A node is defined in 4 Nodes and is one of seven node kinds. An atomic value encapsulates an XML Schema atomic type and a corresponding value of that type. They are defined in 5 Atomic Values. A sequence is an ordered collection of nodes, atomic values, or any mixture of nodes and atomic values. A sequence cannot be a member of a sequence. A single item appearing on its own is modeled as a sequence containing one item. Sequences are defined in 6 Sequences.

However, data models are tricky things, and I really need to read this more carefully. At first glance, it does appear to require some basic notion of well-formedness, unlike the infoset. For instance, it does require that each attribute of an element have a unique name, which the Infoset does not.

XML Path Language (XPath) 2.0
  • Rules for matching values to types are now based strictly on type and element names rather than on structural subsumption.
  • Backward compatibility with XPath Version 1.0 is now handled in a different way; fallback conversions have been eliminated, and an "XPath Version 1 Compatibility Mode" has been introduced that affects the semantics of certain functions and operators.
  • New and more liberal casting rules allow (for example) a value of a derived atomic type to be cast into another derived atomic type as long as the two types have a common supertype. Furthermore, a new form of predicate called castable can now be used to determine if a given value can be cast into a given target type without raising an error.
  • Variables may now be added to the static and dynamic context by the environment external to the query or transformation.
  • New material has been added describing how errors are handled and how optimizers are allowed certain flexibility in evaluating an expression.
XQuery 1.0: An XML Query Language

In addition to the changes listed above for XPath 2.0,

  • The sort by expression has been eliminated and in its place a new order by clause has been added to the FLWR expression (which is now the FLWOR expression).
  • New syntax for explicit construction of text nodes.
  • An optional "positional variable" has been added to the for-clause of a FLWOR expression, to capture the position of each variable binding in the iteration sequence.
XML Query Use Cases

The primary change here is that all the examples now use the new XQuery syntax with order by clauses instead of sortby(). For example,

<bib>
  {
    for $b in document("http://www.bn.com/bib.xml")//book
    where $b/publisher = "Addison-Wesley" and $b/@year > 1991
    order by $b/title
    return
        <book>
            { $b/@year }
            { $b/title }
        </book>
  }
</bib>
XQuery 1.0 and XPath 2.0 Functions and Operators
  • A user-definable error function is called "whenever the semantics being described encounter an error other than a static type error."
  • "There has been some amplification of the rules for constructing simple types and for casting (see section 4 Constructor Functions and section 16 Casting Functions). This work is not yet complete and will be further elaborated in forthcoming versions of this specification."
XSL Transformations (XSLT) Version 2.0

The changes here are fairly minor and technical. The big one is that there is no longer a principal result tree. All "result trees now have the same status, though there is still an initial result tree created implicitly if the stylesheet does not create one using xsl:result-document." In addition, xsl:output supports Unicode normalization, and the type attribute of several elements has been renamed as.

XQuery 1.0 and XPath 2.0 Formal Semantics
  • The type system only uses named typing which removes the need for structural matching on sequence types.
  • Some fixes to the semantics of function calls and operators.
  • Some fixes to the semantics of path expressions.
Saturday, November 16, 2002

The W3C Patent Policy Working Group has posted the last call working draft of the W3C Royalty-Free Patent Policy. Bottom line: "In order to promote the widest adoption of Web standards, W3C seeks to issue Recommendations that can be implemented on a Royalty-Free (RF) basis. Under this policy, W3C will not approve a Recommendation if it is aware that Essential Claims exist which are not available on Royalty-Free terms." The word "Royalty-Free" is actually being used in a slightly unusual way here. To better understand the implications, replace "Royalty-Free" with "Zero-Cost".


Amazon has corrected their mispricing of Processing XML with Java. They now show the correct list price of $54.99, and they're selling it for 30% off at $38.49.


The W3C SVG Working Group has posted proposed recommendations of the Scalable Vector Graphics (SVG) 1.1 Specification. The abstract says, "SVG 1.1 serves two purposes: to provide a modularization of SVG based on SVG 1.0 and to include the errata found so far in SVG 1.0." A test suite is now provided for SVG.

The SVG Working Group has simultaneously posted Mobile SVG Profiles: SVG Tiny and SVG Basic. SVG Tiny is a stripped down version of SVG for cell phones. SVG Basic is a slightly larger version of SVG for PDAs. Comments on both specs are due by December 20th.

The same working group has also posted the first public working draft of Scalable Vector Graphics (SVG) 1.2. This is very preliminary, but possible new features for this version include:

  • A flowText element for wrapping text inside shapes
  • XForms support
  • XML Events support
  • More SMIL features possibly including audio, video, transitions and enhanced timing controls.
  • Rendering Arbitrary XML inside SVG documents using style sheets
  • A printing profile
  • Enhanced Alpha Compositing
  • Z-indexes not based on document order
  • Streaming enhancements
  • The solidColor element is a paint server that provides a single color with opacity. It can be referenced like the other paint servers (gradients and patterns).
  • Background fills
  • Keyboard navigation between picture elements
  • DOM Access to Images
  • Conversion of Mouse Coordinates to the corresponding user space coordinates
  • DOM Level 3 Events
  • A standard SVGWindow interface

The W3C DOM Working Group has released the proposed recommendation of Document Object Model (DOM) Level 2 HTML Specification. The abstract states, "This specification defines the Document Object Model Level 2 HTML, a platform- and language-neutral interface that allows programs and scripts to dynamically access and update the content and structure of [HTML 4.01] and [XHTML 1.0] documents. The Document Object Model Level 2 HTML builds on the Document Object Model Level 2 Core [DOM Level 2 Core] and is not backward compatible with DOM Level 1 HTML [DOM Level 1]." Comments are due by October 16.


The W3C/IETF joint XML Signature Working Group has released the XML-Signature XPath Filter 2.0 Recommendation. According to, the abstract, "XML Signature [XML-DSig] recommends a standard means for specifying information content to be digitally signed and for representing the resulting digital signatures in XML. Some applications require the ability to specify a subset of a given XML document as the information content to be signed. The XML Signature specification meets this requirement with the XPath transform. However, this transform can be difficult to implement efficiently with existing technologies. This specification defines a new XML Signature transform to facilitate the development of efficient document subsetting implementations that interoperate under similar performance profiles."


IBM's alphaWorks has released the Web Services Tool Kit for Mobile Devices. It enables the development of Java applications that talk to SOAP services on the PocketPC, Palm, and BlackBerry and C applications on the Palm.

Friday, November 15, 2002

The W3C XML Linking Working Group has posted proposed recommendations of three XPointer specifications:

The biggest change in these drafts is that non-W3C-defined scheme names must be namespace qualified, and those namespaces must be defined by xmlns parts. Simon St. Laurent has been pointing out on the xml-dev mailing list that this has stretched namespaces and qualified names pretty much well past the breaking point. The syntax proposed is so horribly unusable that it effectively dooms any hope of extending XPointer from outside the W3C. It's becoming obvious that instead of going to recommendation, XPointer should be tabled, and work on an XML addressing syntax for URI fragment identifiers should begin again from scratch. Unfortunately, the working group's charter expires at the end of the year, and they seem to prefer putting out a deeply flawed spec to putting out nothing at all.


Opera Software has posted the first public beta of Opera 7.0 for Windows, a $39 payware web browser that supports XML and CSS. This version is allegedly faster and more standards compliant. There's also a new e-mail client. There don't seem to be a lot of other impressive new features.


The Mozilla Project has posted version 0.6 of Chimera, a small footprint, native Cocoa Mac OS X web browser based on Mozilla's Gecko layout engine that includes lots of XML support. Unlike Mozilla, this is only a browser: no e-mail client, news reader, chat program, or dog walker. Changes in 0.6 include "improvements to plugins, keychain support, better cookie management, and Talkback support."


IBM's alphaWorks has updated their XSL Formatting Objects Composer, "a typesetting and display engine that implements a substantial portion of XSL Formatting Objects", to improve memory, speed, and performance as well as handling larger input documents. However, it doesn't handle quite as much of the XSL spec as earlier versions.

Thursday, November 7, 2002
Procesing XML with Java book
cover

I am pleased to announce the official publication of Processing XML with Java. This is the most comprehensive and up-to-date book about integrating XML with Java (and vice versa) you can buy. It contains over 1000 pages of detailed information on SAX, DOM, JDOM, JAXP, TrAX, XPath, XSLT, SOAP, and lots of other juicy acronyms. This book is written for Java programmers who want to learn how to read and write XML documents from their code.

Normally, this is the point where I'd spend a few paragraphs describing just what's in the book and how important it is to your education, your career, and your love life; but this time I've done something a little different. The entire book is available online. You can read every chapter and every page so you can see for yourself how well this book answers your questions such as, "Why does SAX truncate the text in my documents after a few thousand characters?", "How do I serialize a DOM Document object in an implementation-independent way?", or, "Why doesn't my significant other understand the importance of a building a life size Millennium Falcon in our backyard?". Consequently, I'll forego the usual hype. Check the book out for yourself; and if you like it, please buy a copy. I promise it's cheaper than printing all 1100+ pages on your laser printer.

I received my copy yesterday, and Amazon is reporting that it will be in stock tomorrow. They do have the wrong list price. It's $54.99, not $69.50, but their actual price is a quite reasonable $48.65. Barnes & Noble shows the list price as $49.99 (also wrong) and is selling it at $39.99. You may want to pre-order your copy today, because their initial shipments of my books tend to sell out very quickly once I announce them here. If you missed the first batch, don't worry. Addison Wesley will ship more very quickly. It does not normally take the advertised "2-3 weeks". Brick and mortar stores should have their copies very soon as well.

I'm heading off to Europe later today for a much needed vacation. (Well, a working vacation anyway. I'll be stopping by the Javapolis conference in Antwerp next Wednesday, November 13, to talk about Refactoring Java and the Top 10 Myths About Java I/O. It's only €150; and looks to be a really fun show. If you're not too far from Antwerp, check it out.) Consequently, updates will be slow to non-existent here for the next week. If you think I planned this just so my book hype would be front page news here for a week, well, OK. You caught me. :-) In the meantime, why don't you check out Processing XML with Java. If you like what you see, you can buy it at Amazon or any other purveyor of fine computer books.


In an unrelated note, elharo.com is down for the time being due to either Covad, Verizon, or Speakeasy problems. (Whose fault it is exactly has not been determined yet.) and may not be fixed until I return late next week. Cafe au Lait and Cafe con Leche are both hosted by the friendly and much more skilled folks at IBiblio and should be fine.

Wednesday, November 6, 2002

SpaceMapper - DataStore is an open source "document repository server for storing, querying and fetching XML based documents. It is built on practical needs allowing the storage of semi-structured (well formatted, maybe validated, XML, XHTML and HTML) documents and un-structured documents (TXT)." The documents are stored in conventional relational databases such as PostgreSQL, MySQL, or DB2. Space Mapper is wwritten in Java on top of the Avalon Phoenix framework. The documents are managed through BEEP and/or XML-RPC interface using a subset of the Simple Exchange Profile (SEP) protocol.

SpaceMapper includes MN8, "an experimental object oriented scripting language, tightly integrated with the net, which emulates the concepts at the core of XML in order to simplify and make as transparent as possible information extraction and manipulation from the WWW and XML documents." To give you a bit of the flavor of the language, here's the first example from the tutorial that defines a "concept" (an MN8 programming structure that would probably be called a class in most languages) for a Person:

# -------- Person.mn8 ---------
define Person label "PersonalPersonDefinition" [
	@firstName
	@lastName	typeof String label "lastName"
	email label "Email.Address"
	address [
		street
		city
		state	
		country [
			@code
			value typeof String
		]
	]
	
	: getMyPerson [
		@firstName = "Remus"
		@lastName = "Pereni"
		/email = "remus@nolimits.ro"
		/address/street = "Some street"
		/address/city = "Satu Mare"
		/address/state = "no state J"
		/address/country@code = "RO"
		/address/country/value = "Romania"
	]
	
	static : main ( $args typeof Series ) [
		$me typeof Person
		$me.getMyPerson
		print $me.toXML
	]
]

MN8 is Written in Java and includes concepts for HTML, HTML-Forms, Cookies, RSS, OPML, HTTP, FTP, POP3, SMTP, Jabber, BEEP, XML-RPC, SOAP, MBox.

Tuesday, November 5, 2002

Jens Låås has released xmlclitools 1.42, four Linux command-line tools for searching, modifying, and formating XML data. The tools are designed to work in conjunction with standard utilities such as grep, sort, and shell scripts. Version 1.42 adds sorting to xmlmod and new special property names to xmlfmt, and an RPM spec file. All four tools are published under the LGPL.

Monday, November 4, 2002

Opera Software has posted the second beta of Opera 6.0 for MacOS and Mac OS X. This release adds shared library support and enables Java in the classic MacOS, but offers no major new features in the XML space. It still supports direct display of XML with CSS stylesheets. XSL is still missing in action. Opera is normally $39 payware or free-beer adware, but right now, there's a sale so you can buy it for $29.

Sunday, November 3, 2002

I've posted version 1.0d8 of XOM, my open source, tree-based Java API for processing XML that strives for strict compliance to the XML specs. There are no breaking changes in this release. The big new feature is that XSLT works (modulo some obscure bugs in handling the undeclaration of the default namespace. I need to get some clarification on the proper behavior of SAX processors to fix this.) As part of supporting XSLT, I discovered a need to undeclare the default namespace on a prefixed element. That is <pre:name xmlns:pre="http://www.example.com" xmlns="">. You can now do this by passing an empty string for both the prefix and URI to declareNamespace().

I'm travelling quite a bit in November so this is probably the last release until next month. The API, however, is starting to feel quite stable to me. Most of the things on my to do list involve implementation details, testing, documentation, benchmarking, optimization, and the like. One of the places I'm travelling this month is the SD Expo show in Boston. I'll be hosting a Birds of a Feather session there on "What's Wrong with XML APIs and How to Fix Them." I expect this to be more interactive than the XML SIG session in September.

Friday, November 1, 2002

The Apache XML Project has released version 2.4.1 of Xalan-J, an open source XSLT processor written in Java that supports XSLT 1.0 and TrAX. New features in this release include:

  • Performance fixes and enhancements to address some performance regressions between 2.3.1 and 2.4.0
  • A prototype implementation of the DOM Level 3 XPath Specification.
  • Additional EXSLT function support.
  • EXSLT support in XSLTC.
  • Nodeset and redirect extensions in XSLTC.
  • Command line access to XSLTC
  • Eliminated size limitation on XPath expressions.
  • Various bug fixes.
Thursday, October 31, 2002

Gal Binyamini has released JXV, an open source library that allows Java objects to be given "XML Views", and for those views to be read back into objects. (This strikes me as a little more plausible than the other direction in which you start with an XML document and build a custom Java object around it.) Essentially, this is another variation of object serialization using XML. JXV supports SAX input and output and DOM output. According to Binyamini,

JXV uses a pluggable architecture which allows XML view factories to be configured and loaded at runtime. The JXV configuration mechanisms also leverage XML namespaces to allow the configurations for those different view factories to be inlined within the JXV configuration file. In this release, JXV comes pre-configured with view factories for JavaBeans, collections, array, and "flat objects" such as Strings, primitives, etc. These factories support a wide variety of configuration options, and are sufficient for most object models. Future versions of JXV will include pre-configured support for additional factories. JXV may also release special-purpose factories (such as ones providing views for RowSets, ResultSets and other JDBC structures) as extension packages.

Andrew Watt reports that, "After thinking about the discussions we have had in the last few days about the difficulties of newcomers to XML getting a grasp on the important concepts of XML I have decided to set up a mailing list for newbies called, very imaginatively, 'XMLNewbies'. So if you know people who are new to XML you might want to point them in that direction." You can subscribe by sending an email to XMLNewbies-subscribe@ yahoogroups.com.


IBM's alphaWorks has released the Multimodal Browser Extension, a plug-in that allows Internet Explorer to render multimodal applications written according to the W3C XHTML+Voice (X+V) note. "This technology, which includes IBM's automatic speech recognition and text-to-speech engines, allows testing of voice-enabled Web applications written in the X+V language." Windows 2000 or XP is required.

Wednesday, October 30, 2002

Sun's posted a beta of the Java Architecture for XML Binding 1.0 (JAXB) on the Java Developer Connection (registration required). JAXB compiles an XML schema into one or more Java classes. (First mistake: JAXB assume there's a schema. Second mistake: It assumes the schema is written in the W3C XML Schema Language. Third mistake: It assumes documents actually adhere to the schema.) JAXB can unmarshal schema-valid XML into Java objects; read, update and validate the Java objects against the schema, and write the result back out as XML.


The Mozilla Project has released version 0.4 (Oceano) of Phoenix, a light-weight browser for Windows and Linux based on Mozilla's Gecko engine. It supports all the yummy XML features, but doesn't include the e-mail program, news reader, or nose hair trimmer. Phoenix differs from similar efforts like Galeon in that it's based on XUL and is designed for cross-platform release on Linux and Windows. (Mac OS X users should check out Chimera instead.) Improvements in 0.4 include themes support, type ahead find, better pop-up blocking, toolbar customization, and tabbed browsing, as well as assorted bug fixes.

Tuesday, October 29, 2002

The W3C Web Services Description Working Group has posted the Last Call Working Draft of Web Service Description Requirements. According to the W3C web page, this "document describes definitions and requirements for specifying application to application communication." Comments are due by the end of the year.

Monday, October 28, 2002

Lucid'i.t. has released version 1.1 of their Lucid XML Editor, a web based document editor for Windows and Internet Explorer. I think this is free-beer.


Al Byers has released AG101 0.3.2.2, an open source, visual XSL editor/debugger written in Java. It is based on the Pollo XML editor. AG101 allows the user to visually set breakpoints in the XSL code and step through the source code. At breakpoints the values of variables, selects, etc. can be inspected.


Xerlin 1.2.1 is an open source XML Editor written in Java. Users can extend the application via custom editor interfaces for specific DTDs. Java 1.2 or later is required.


Simon St. Laurent has published an The XPointer xpath1() Scheme, an IETF Internet-Draft that defines an xpath1() scheme for use inside the W3C's XPointer Framework. In essence this is the same as the existing xpointer() scheme after subtracting points and ranges. For example, today's news on Cafe con Leche could be identified as http://www.cafeconleche.org/#xpath1(//today). The news for Monday, October 28, could be http://www.cafeconleche.org/#xpath1(//*%5B@id=' news2002October28'%5D). This is nothing that can't be done now with the xpointer() scheme, but it is a lot simpler to implement without the points or ranges.


Open Wave has released version 6.1 of the OpenWave Software Development Kit, a Windows cell phone emulator for WAP 2.0 phones based on XHTML and CSS. For older phones, it also supports Wireless Markup Language (WML) 1.1, WML 1.3 with GUI Extensions for M-Services, WMLScript, WAP Push, cHTML, and HDML 3.0.

Sunday, October 27, 2002

Johannes Dobler's released version 1.2.9 of jd.xslt, an open source XSLT processor written in Java that supports most of the now defunct XSLT 1.1 working draft. This release fixes tail recursion and adds support for fragment identifiers in URIs for the document() function.


Michael Fuchs has posted version 0.3.1 of his DocBook Doclet that creates DocBook SGML and XML documents from JavaDoc. This release uses graphviz 1.8.9 to generate UML class diagrams.


Max Kellermann's LeanEdit 1.8.6 an open source XML editor written in Java. Judging by the screen shots it appera to be form-based, and needs to be customized for different DTDs and schemas. LeanEdit is published under the GPL.


Pekka Enberg's posted version 0.1.9 of XML Indent, an open source (GPL) "XML stream reformatter written in ANSI C" that "is analogous to GNU indent." This release fixes some bugs.

Saturday, October 26, 2002

The W3C Web Ontology Working Group has published first public Working Draft of Web Ontology Language (OWL) Test Cases. "The draft illustrates correct OWL usage, the formal meaning of OWL constructs, and resolution of issues considered by the Web Ontology Working Group. OWL is used to publish and share sets of terms called ontologies, providing accurate Web search, intelligent software agents, and knowledge management."


The W3C Device Independence Working Group has posted the first public Working Draft of Authoring Challenges for Device Independence. "The document provides a discussion of several challenges that web site authors commonly face when making content and applications available to users with devices of various capabilities The document examines the effects on authors and the implications for authoring techniques that assist in the preparation of sites that can support a wide variety of devices."

Friday, October 25, 2002

The W3C CSS working group published three new working drafts of CSS3 modules:

CSS3 module: Ruby

This document proposes a set of CSS properties for Ruby text used in Japanese to annotate other text, often for purposes of pronunciation. These properties include ruby-position, ruby-align, ruby-overhang, and ruby-span. This draft is in last call. Comments are due by November 27.

CSS3 module: text

This document describes the basic text formatting properties for CSS3 including writing-mode, direction, glyph-orientation-vertical, glyph-orientation-horizontal, unicode-bidi, text-script, text-align, text-justify, text-align-last, min-font-size, max-font-size, text-justify-trim, text-kashida-space, text-indent, line-break, word-break-CJK, word-break-inside, word-break, wrap-option, linefeed-treatment, white-space-treatment, all-space-treatment, white-space, text-overflow-mode, text-overflow-ellipsis, text-overflow, letter-spacing, word-spacing, punctuation-trim, text-autospace, kerning-mode, kerning-pair-threshold, text-underline-style, text-line-through-style, text-overline-style, text-underline-color, text-line-through-color, text-overline-color, text-underline-mode, text-line-through-mode, text-overline-mode, text-underline-position, text-blink, text-underline, text-line-through, text-overline-mode, text-decoration, text-shadow, line-grid-mode, line-grid-progression, line-grid, text-transform, hanging-punctuation, and text-combine. Many of these should be familiar from CSS2. The new ones mostly address the needs of East Asian and bidirectional text. This draft is also in last call. Comments are due by November 27.

CSS3 module: The box model

According to the abstract:

CSS (Cascading Style Sheets) describe the rendering of documents on various media. When textual documents (e.g., HTML, WML) are laid out on visual media (e.g., screen, paper), CSS represents the elements of the document by rectangular boxes that are laid out one after the other or nested inside each other in an ordering that is called a flow. This module describes the characteristics of the flow and of the various kinds of boxes.

The flow includes "floating" boxes, but tables [CSS3TBL] and "absolute" and "fixed" positioning [CSS3POS] are described in other modules. Also, the rules for partitioning a flow into pages (for paged media) is described elsewhere [CSS3PAGE], as are the special boxes for ruby annotations [CSS3RUBY] and the multicolumn layouts [CSS3COL].

The box model builds on the inline text modules ([CSS3TEXT] and [CSS3LINE]), that describe how text is laid out on a line, including treatment of superscripts, bidirectional ("bidi") and vertical text.

The flow can be horizontal (typical for most languages), but in level 3 of CSS, flows can also be vertical (typical for the Uighur script and often used for ideographic scripts).

Thursday, October 24, 2002

The W3C Voice Browser Working Group has published the second working draft of Voice Browser Call Control: CCXML Version 1.0. According to the spec abstract, "CCXML is designed to provide telephony call control support for VoiceXML or other dialog systems. CCXML has been designed to complement and integrate with a VoiceXML system. Because of this you will find many references to VoiceXML's capabilities and limitations. You will also find details on how VoiceXML and CCXML can be integrated. However it should be noted that the two languages are separate and are not required in an implementation of either language. For example CCXML could be integrated with a more traditional IVR system and VoiceXML or other dialog systems could be integrated with some other call control systems."


TM4J 0.7.1 has been released. This is an open source topic map processing toolkit for Java as well as a set of topic map processing tools. Topic maps are an ISO standard for the interchange of information structures which can be used to represent ontologies, business data and processes, individual knowledge and opinions, and more. This engine processes files conforming to the XML Topic Maps (XTM) specification and stores them either in memory or in a persistent store, providing access via a Java API. This is a bug fix release.

Wednesday, October 23, 2002

The W3C DOM Working Group has a new working draft of Document Object Model (DOM) Level 3 Core Specification. The biggest new feature in this release seems to be support for providing type information for attributes and elements. DTD types are provided for attributes. Schema types are provided for both elements and attributes. Unlike the recently killed abstract schemas effort, the approach taken (just provide a type name and URI for each node) seems much more extensible and much less tied to particular schema languages. I think this is a clear case of doing something better by doing less.

Tuesday, October 22, 2002

Jens Låås has released xmlclitools 1.41, four Linux command-line tools for searching, modifying, and formating XML data. The tools are designed to work in conjunction with standard utilities such as grep, sort, and shell scripts. Version 1.41 adds wildcard matching was added for xmlgrep, whitespace stripping from the ends of output strings, Makefile improvements, and some basic manpages. All four tools are published under the LGPL.


Monday, October 21, 2002

Version 1.1.2 of the XmlPull API has been released. I'll be talking about this (and other pull APIs for XML parsing) at SD2002 East in Boston in November. The major improvement in this release is that the XML declaration is no longer treated as a processing instruction. Version 1.1.2 also improves Java 2 Micro Edition (J2ME) compatibility and enhances XmlSerializer.


The BulTreeBank Project has released an XPath Implementation Engine for Java. This is free-beer for non-commercial use.


The sixth beta of Luxor, a GPL'd XML User Interface Language (XUL) toolkit for Java, has been posted. Luxor includes a web server, a portal engine that supports RSS, the Velocity template engine, a Python interpreter, and more. Beta 6 adds:

  • More CSS properties such as border (e.g. solid, double, groove, ridge, inset, etc.), font, background, and color
  • New tags including deck, stack, mask and window
  • Ad-hoc context popup menus
  • XHTML 2.0 like href and target attributes
  • Toggle and radio buttons
  • Checkbox menu items
  • about, portal, and portlet URLs
Sunday, October 20, 2002

The W3C User Agent Accessibility Guidelines Working Group has published the proposed recommendation of User Agent Accessibility Guidelines 1.0. According to the abstract, "This document provides guidelines for designing user agents that lower barriers to Web accessibility for people with disabilities (visual, hearing, physical, cognitive, and neurological). User agents include HTML browsers and other types of software that retrieve and render Web content. A user agent that conforms to these guidelines will promote accessibility through its own user interface and through other internal facilities, including its ability to communicate with other technologies (especially assistive technologies). Furthermore, all users, not just users with disabilities, are expected to find conforming user agents to be more usable."

Saturday, October 19, 2002

IBM's alphaWorks has released the XML Wrapper Generator, a graphical tool that integrates XML data sources into a DB2 database. The tool loads XML schema files, "shreds" them to a relational schema, and generates appropriate NICKNAME and VIEW statements.


alphaWorks has also released WSDL Explorer, a Windows application that displays Web Services Description Language (WSDL) documents, "generates views of operations, allows invocation of operations, and allows viewing of sample message flow."


Daniel Veillard's released version 1.0.22 of libxslt, the GNOME XSLT library and version 2.4.26 of libxml2, the GNOME XML parser for Linux. The new version of libxslt updates the Windows makefiles, adds a security module, supports a few new options to xsltproc, adds a per transformation error handler, and fixes a few bugs. The new version of libxml works better with Windows CE and fixes some bugs with validation for both DTDs and schemas.


Friday, October 18, 2002

The Mozilla Project has posted the first beta of Mozilla 1.2, an open source web browser that supports XML, simple XLinks, MathML, CSS, XSLT, XHTML, XUL, SVG, and many other cool acronyms. Most importantly it lets you turn off pop-up ads and block web bugs and cookies in a sensible way. (IE claims to let you manage your cookies, but it only works about 80%. Mozilla's cookie management is much smoother.) New features in this beta include:

  • Link prefetching
  • "Filter after the fact" for mail already received
  • Show toolbars as text/icons/both, has been implemented.
  • You can launch the browser with a bookmark group as your start page. This loads several pages into tabs at startup.

Most importantly for me personally, this release finally fixes a long standing AppleScript bug that prevented me from switching over to Mozilla on the Mac. I can finally remove IE from my work chain completely. Update: I spoke too soon. That bug is indeed fixed, but as often happens in software development, fixing one bug reveals another. This one isn't as bad as the last one—it only affects the quote of the day, not the recommended reading—but it still means Mozilla can't do quite everything I need it to do.

Other features added since 1.1 include "Type Ahead Find" and a pretty printed raw XML view, like that found in Internet Explorer. XML pretty printing is only available in the .zip distribution and is turned off by default because it affects the DOM for unstyled XML-pages. To turn it on, add user_pref("layout.xml.prettyprint", true); to your user.js file.

However, Mozilla is still a fairly large and monolithic web browser/e-mail program/news reader/chat client/application platform/child minder/dog washer/nose hair trimmer and probably always will be. If you'd like to try a leaner, meaner browser-only application, you should check out the recently released Phoenix 0.3 instead. This browser is based on Mozilla's Gecko engine so it supports all the yummy XML features, but doesn't include the e-mail program, news reader, or nose hair trimmer. Phoenix differs from similar efforts like Galeon in that it's based on XUL and is designed for cross-platform release on Linux and Windows. (Mac OS X users should check out Chimera instead.)

Thursday, October 17, 2002

The W3C Web Services Architecture Working Group has published the third public working draft of Web Services Architecture Requirements. According to the abstract,

The use of Web services on the World Wide Web is expanding rapidly as the need for application-to-application communication and interoperability grows. These services provide a standard means of communication among different software applications involved in presenting dynamic context-driven information to the user. In order to promote interoperability and extensibility among these applications, as well as to allow them to be combined in order to perform more complex operations, a standard reference architecture is needed. The Web Services Architecture Working Group at W3C is tasked with producing this reference architecture.

This document describes a set of requirements for a standard reference architecture for Web services developed by the Web Services Architecture Working Group. These requirements are intended to guide the development of the reference architecture and provide a set of measurable constraints on Web services implementations by which conformance can be determined.

My favorite part of this document is that it actually defines what the heck a web service is:

Definition: A Web service is a software application identified by a URI, whose interfaces and bindings are capable of being defined, described, and discovered as XML artifacts. A Web service supports direct interactions with other software agents using XML based messages exchanged via internet-based protocols.

In the past, I've noticed that how a web service is defined often depends on what a vendor is trying to sell me. Notably absent from this definition is any requirement to use HTTP, SOAP, WSDL, UDDI, or similar FLAs (four-letter acronyms).


On a related note for all the developers who keep asking me to announce their web services products (and you are legion): if you want your product announced here you need to be able to explain in one paragraph what the product is and what it does. (Also required: what platforms it runs on and what it costs, though those requirements don't seem to be causing people as much trouble as explaining what their products actually do.) Do not use adjectives or adverbs, especially comparatives or superlatives (better, faster, more robust, efficiently, etc.). Do not define your product by comparison to some other product since I probably don't know what that product does either. Do not use buzzwords like "web services" or "application server". (I once walked across an Internet World show floor asking each and every booth selling an "application server" what an application server was. Most frequent answer: "If you come back later, the right person to answer your question will be here.") Explain in plain language what your product does and why a developer might need such a thing.

An example of what not to send me, adapted from a recent e-mail (names changed to protect the guilty, and because they're hardly the only group in this space that can't seem to explain what they're doing):

Cherokee Allies 1.0, the Open Source product of the XML Cherokee Group is now released. Allies, which Cherokee's John Doe likes to call "Cherokee SOAP 3.0", is quite a bit more powerful than Cherokee SOAP 2.0. Like SOAP 2.0, Allies supports the latest SOAP 1.1 spec. However Allies also supports WSDL 1.1. Allies includes implementations of both the JAX-RPC and SOAP API with Attachments for Java (SAAJ) specifications. Importantly, Allies is making some significant contributions with highly useful features that promise to improve interoperability and capabilities of future Web services as a whole.

An example of what to do:

Cherokee Allies 1.0 is a generic server written in Java that communicates with remote clients by sending and receiving XML documents over HTTP. These documents adhere to the SOAP 1.1, JAX-RPC, and SOAP API with Attachments for Java (SAAJ) specifications. Out of the box, Allies doesn't do much of anything. Sites customize Allies by writing small Java programs called "foolets" that respond to particular kinds of SOAP messages. The foolets are written in Java, and can do essentially anything a Java program running on that server can do; for example, talk to a database with JDBC, read a file, send data to a printer connected to the parallel port, invert a matrix, etc. The results of the foolet's work are then transmitted back to the requesting client as another XML document. The messages the server understands and responds to are described by a WSDL document that client programmers can retrieve and inspect. Allies handles all the generic services involved in sending and receiving HTTP, marshalling and unmarshalling arguments to XML documents, and, optionally, supporting transactions. The foolet programmer can focus on the unique local logic of their system. Client programs can be written in any language capable of generating and receiving XML documents over HTTP. Allies is published under the Cherokee license.
Wednesday, October 16, 2002

From the beast that wouldn't die department, the W3C XML Core Working Group has brought forth the candidate recommendation of XML 1.1; and, surprise, surprise, it's even worst than the last draft. This release has a few big new features:

  • C0 control characters such as form feed, vertical tab, BEL, and DC1 through DC4 (whatever those are) are now allowed in XML text. However, they must be escaped as character references. They cannot be included literally in data. Nulls, thankfully, are still forbidden.

  • The C1 control characters such as BPH, IND, NBH, and PU1 are no longer allowed as literals in XML text. They too must now be escaped as character references. For the first time this means that some well-formed XML 1.0 documents are not well-formed XML 1.1 documents. The exception, of course, is IBM's holy grail of NEL, which will be allowed in literal XML text, just to make life difficult for every text editor on the planet except those from IBM mainframes.

  • Unicode character normalization should be performed on XML documents, unless you don't feel like it, in which case you can ignore it. This almost makes sense. Basically it says that parsers may report an e followed by a combining accent acute instead of the single character é as an error of unspecified type if they want to or the client asks for it. The details are quite complicated, but at least it's optional. However, I still worry that this is a source of interoperability problems, especially when it comes to names of elements and attributes. For instance, a normalizing validator might accept documents a non-normalizing validator would reject.

And of course all the other problems previous drafts have had are still present. I've already calumnied these sufficiently in the past. Let me just reprint my criticisms now. What follows was originally posted on June 21, 2001. Regrettably, it's just as relevant today:

This is a proposal for a new backwards incompatible version of XML. The specific goal is to address some shortcomings of the XML 1.0 character model relative to Unicode 3.1, as well as throwing a sop to IBM.

The concern with respect to IBM is that one of the world's largest corporations, with thousands of patents, legions of programmers, billions of dollars in revenue, and resources pouring out of every orifice is somehow unable to handle documents where lines end with carriage returns and line feeds, as documents do on every non-IBM system on the planet. The only reason there's a problem here at all is because IBM tried to go it alone as a monopoly and set standards by fiat for years rather than working with the rest of the industry. Consequently their mainframe character sets don't really interoperate well with everybody else's character sets. In XML this arises as a problem with line endings when someone edits an XML document with an IBM mainframe text editor. IBM mostly grew out of their anti-competitive monopolistic tendencies over the last thirty years (with a large dose of assistance from the U.S. government). However, there are still some legacy issues relating to their attempt to dictate standards to the rest of the industry, and this is one of them. Now rather than fixing their own broken mainframe text editing software, they want everyone else on the planet to change their software so IBM doesn't have to. (If this reminds anybody of the current mess with Oracle and UTF-8, you're not alone.) This proposal was laughed out of the W3C a few months ago when IBM made it, or at least it seemed to be. However, it's now risen from the dead as part of XML Blueberry; but it doesn't make any more sense now than it did then; and it still deserves to be laughed off the table with whooping cries of derision.

The second proposal for breaking backwards compatibility with existing parsers is much more serious, and requires a more thoughtful response. Starting in Unicode 3.0 a number of new characters have been added both for new scripts that were previously unencoded such as Amharic and Cherokee as well as for old scripts that were incomplete such as Chinese. The concern is that since XML 1.0 is based on Unicode 2.0, "fully native-language XML markup is not possible in at least the following languages: Amharic, Burmese, Canadian aboriginal languages, Cantonese (Bopomofo script), Cherokee, Dhivehi, Khmer, Mongolian (traditional script), Oromo, Syriac, Tigre, Yi. In addition, Chinese, Japanese, Korean (Hangul script), and Vietnamese can make use of only a limited subset of their complete character repertoires."

If this were true, it would be a very serious criticism of XML 1.0 Fortunately, however, the claim is not nearly as dire as the proposal makes out. Indeed the proposal substantially overstates the need for any changes. The XML 1.0 BNF productions do not allow these newly defined characters to be used in element, attribute, and entity names. However, they can be used in the text of element content and attribute values. This means that XML is fully adequate for literature and data in Amharic, Burmese, Canadian aboriginal languages, Cantonese, Cherokee, Dhivehi, Khmer, Mongolian, Oromo, Syriac, Tigre, Yi, Mandarin, Japanese, Korean, and Vietnamese. Only the markup, that is, the tags, would have to be written in another script. Given that there aren't even localized operating systems in most of these languages, and that today's software effectively requires users to have a solid knowledge of at least the ASCII characters, I don't think the need to write markup (as opposed to text) in Cherokee justifies breaking backwards compatibility.

But wait! It's not even that bad. Several of the languages listed are total red herrings. You most certainly can write markup in Cantonese, Japanese, Korean, Mandarin, and Vietnamese today. The new characters Unicode has added to these scripts are very obscure. In fact, experts often disagree over whether some of them exist at all, or are merely typographical variations of existing characters. Since the 1700s Vietnamese has been written in a Latin-based alphabet that is fully available in XML and that can write any Vietnamese word. Vietnamese only uses the Han ideographs for classical documents and occasional signage or decoration, and it seems very unlikely that a Vietnamese speaker would write their markup using Han ideographs. Japanese has not one but two phonetic alphabets that can write any Japanese word if the right Han ideograph character is not encoded. Chinese speakers can use either Latin characters or the native Bopomofo phonetic system for the very rare cases where a character they need is not encoded. The fact is most native speakers of Chinese, Japanese, Korean and Vietnamese do not recognize the vast majority of these new characters, and the need for them in markup (again, as opposed to text) is non-existent.

There are a few good points in this proposal. I'm sure there's an occasional need for writing markup in Amharic, Burmese, Khmer, Mongolian, Yi, and a few of the other languages the proposal lists. But I don't believe there's enough of a need to justify breaking compatibility with existing XML parsers, software, and systems. The XML Blueberry Requirements vastly overstate the case by ignoring the difference between markup and text in XML documents. I'd be willing to break backwards compatibility to allow text in these languages if we had to, but we don't. Text is already adequately handled by XML 1.0. All we're arguing about now are the tags, and that's just not a strong enough reason to break backwards compatibility.

Tuesday, October 15, 2002

Torsten Bronger's tbook 1.4 "typesets XML documents with high-level LaTeX. (X)HTML/MathML and DocBook output is also possible. It is based on the LaTeX-like tbook DTD developed for this project, XSLT transformations, and other tools, including a Unicode extension for LaTeX."


Better late than never. I've posted the complete batch of examples from the second edition of XML in a Nutshell here on Cafe con Leche. New examples in this edition cover schemas and RDDL.

Monday, October 14, 2002

The Microsoft WebData XML Tools team has released an online, HTML form-based interface for validating schemas (XSD & XDR) and instance documents using the System.Xml.XmlValidatingReader class in the .NET framework.

Sunday, October 13, 2002

I've posted version 1.0d6 of XOM, my open source API for processing XML with Java. The major additions to this release are two new packages that offer partial but significant support for XInclude and Canonical XML.

This release makes very limited backwards incompatible changes to the API. (A few formerly public methods in Serializer are now protected.) Almost all code that previously compiled and ran with 1.0d4 and 1.0d5 should still compile and run. Other new features in the API in this release include:

  • Namespace URIs must now be absolute URI references
  • Element.toXML now generates empty-element tags for empty elements
  • Serializer has four new protected methods to provide subclasses with more access to the underlying OutputStream:
protected final void writePCDATA(String text) throws IOException
protected final void writeAttributeValue(String value) throws IOException
protected final void writeMarkup(String text) throws IOException
protected final void breakLine() throws IOException

In addition, several bugs were fixed. XOM is published under the LGPL.

Saturday, October 12, 2002

RenderX has released version 3.0 of XEP, its $999.95 XSL Formatting Objects to PDF and PostScript converter. New features in version 3 include:

  • support for right-to-left writing mode (including a preliminary implementation of bidirectionality);
  • support for rotated text;
  • support for side floats;
  • support for blocks spanning multiple columns;
  • support for page-position="last" in conditional page masters;
  • support for horizontal text scaling (via font-stretch attribute);
  • support for from-table-column() function;
  • a number of useful extensions to XSL spec:
  • a validation mechanism that checks XSL FO data
  • SAX 2 interfaces for both input and output
  • JAXP integration classes, and a command-line interface to perform direct XML+XSL->PDF/PostScript transformation
Thursday, October 10, 2002

The W3C DOM Working Group has posted a new working draft of Document Object Model (DOM) Level 3 Validation Specification Version 1.0. According to the abstract, "This module provides the guidance to programs and scripts to dynamically update the content and the structure of documents while ensuring that the document remains valid, or to ensure that the document becomes valid."

Wednesday, October 9, 2002

Syncro Soft has released version 1.2.3 of <oXygen/>, a $65 payware XML editor written in Java that can run as an applet. <oXygen/> 1.2 supports XSLT and XSL-FO, among other features. Version 1.2.3 adds about a dozen small new features including color printing and find support for regular expressions.

Tuesday, October 8, 2002

The W3C DOM Working Group has released a new candidate recommendation of Document Object Model (DOM) Level 2 HTML Specification. Comments are due by October 16. "This version updates the 5 June 2002 version based on the feedback from the implementers and the results of the DOM Level 2 HTML Test Suite."


Jens Låås's released version 1.4.0 of xmlclitools, a set of four Linux command-line tools for searching, modifying, and formating XML data. The tools are designed to work in conjunction with standard utilities such as grep, sort, and shell scripts. Version 1.4.0 adds negated queries and multiproperty queriesto xmlgrep. They are published under the LGPL.

Monday, October 7, 2002

The W3C XML Encryption Working Group has posted proposed recommednations of XML Encryption Syntax and Processing and the Decryption Transform for XML Signature. Comments are due by October 31.


The Mozilla Project has posted version 0.2 of Phoenix, a lightweight browser that's based on Mozilla but is just a browser, no e-mail program, no news reader, no kitchen sink. This is good news for those of us who love the Mozilla browser, but use different e-mail and new programs. Phoenix differs from similar efforts like Galeon in that it's based on XUL and is designed to eventual cross-platform release. Version 0.2 is only available on Windows and X86 Linux. New features in version 0.2 include:

  • Web form auto-complete
  • Downloads Sidebar
  • Bookmarks Sidebar
  • History Sidebar
  • Extension management
  • Toolbar customization
  • Search bar
  • Preferences support for Proxies, tabbed browsing, scripts, and image looping.
  • Ctrl+Mousewheel resizes fonts
Sunday, October 6, 2002

I've posted version 1.0d5 of XOM, my open source library for processing XML with Java. This release marks a milestone in XOM's development: it is the first to be backwards compatible with the previous release. I'm not promising there won't be any breaking changes in future releases yet. However, the API does feel like it's stabilizing. This release added several useful new methods to Builder, fixed numerous bugs in serialization, and made the Attribute.Type class safe for multiple-class loader environments.

Saturday, October 5, 2002

Version 0.4.0 of the open source FOA (Formatting Object Authoring tool) has been released. FOA is a Java application "that gives users a graphical interface to author XSL-FO stylesheets. With FOA you can generate pages, page sequences and fill them with content provided into one or more XML files. FOA will generate the XSLT stylesheet that transforms the XML content into an XSL-FO document." New features in 0.4.0 include an absolute positioning brick, page number formats and counting, additional US page formats, and bug fixes.


Reed Esau's Ptarmigan Media Parser for XML 0.2 produces SAX events from the metadata found in (non-XML) media files and streams. Supported formats include ID3v1, ID3v2, Vorbis/Ogg, FLAC, WMA, M3U, PLS, ASX, and B4S. Ptarmigan is published under a BSD license.


Pekka Enberg's posted version 0.1.8 of XML Indent, an open source (GPL) "XML stream reformatter written in ANSI C" that "is analogous to GNU indent." This release fixes some bugs.

Friday, October 4, 2002

Chapsoft's released EZxslt 1.0, a group of XSLT stylesheets that automatically generates Microsoft Word documents fitting a given template from a FileMaker Pro 6 database. It works with Windows and Mac versions of FileMaker and Word, and requires Word 97 or later. EZxslt is $129.95 payware for a 5 user version.

Thursday, October 3, 2002

Henry S. Thompson has released a new version of his open source XSV schema validator. This version refactors the code to make it "PPC (Python Politically Correct). New functionality includes command-line settable optional invocation control of top-level element name and/or type, partial support for the 'pattern' facet." XSV is published under the GPL.

Wednesday, October 2, 2002

Patrick Durusau and Matthew Brook O'Donnell have published a paper on Just-In-Time-Trees. The basic idea seems to be that you can extract subtrees of a document, and use them as complete documents, while ignoring the rest of the document. A Saxon 7.2 implementation is provided, though ideally a custom parser would be much more efficient. John Cowan suggested something similar he calls "Shemp" for XOM (after Simon St. Laurent's MOE). It's an interesting idea. Just-In-Time-Trees have the potential to be as easy to use as a tree-based API like JDOM or DOM while as fast and efficient as a streaming API like SAX or XMLPULL. I'm still trying to figure out exactly what the API for such a thing should look like before I work on the implementation.

Tuesday, October 1, 2002

Antenna House, Inc has released version 2.3 of XSL Formatter, an XSL Formatting Objects (XSL-FO) ActiveX control and browser Windows. Version 2.3 focuses on expanding multilingual support and adds support for fo:float. XSL Formatter is payware, starting at $1980 for a standalone license. PDF output raises the price to $4480. (No, I didn't leave out any decimal points.)

Monday, September 30, 2002

I've posted version 1.0d4 of XOM, my open source, tree-based library for processing XML with Java. XOM derives from my exxperience writing Processing XML with Java. It attempts to combine the best aspects of the existing APIs like DOM and JDOM, while removing a lot of their quirkiness.

The major addition in 1.0d4 are methods to get and set the base URI of a Node. You can invoke getBaseURI from any Node object to retrieve the URL against which relative URLs in that Node should be resolved. This is calculated in keeping with XML Base. That is, if an xml:base attribute is in scope its value is used. Otherwise, the URI of the entity in which the Node appears is loaded. You can change the underlying URI of the entity using the setBaseURI method in ParentNode. When a document is built, the parser fills in the base URI for each node. This is stored separately from xml:base attributes, which are not treated differently than any other attribute. When a document is serialized, you may request that the serializer fill in extra xml:base attributes not present in the infoset to preserve the underlying base URIs. However, since this is a structural change to the document, this feature is turned off by default.

Beyond this, I expanded support for subclassing. Subclasses should now be able to intercept essentially any modification to a Node. (If you notice anywhere, that's not possible, it's an oversight. Please let me know so I can fix it.) I replaced the Attributes and Namespaces class with additional methods in Element. A number of bugs were fixed, especially in serialization. Finally, I renamed a number of methods to make their nature a little more obvious.


Pekka Enberg's posted version 0.1.6 of XML Indent, an open source (GPL) "XML stream reformatter written in ANSI C" that "is analogous to GNU indent." This release fixes compilation errors on non-GNU systems and improves the build process.


Andy Clark has posted a new release of his CyberNeko Tools for the Xerces Native Interface (NekoXNI). This is a collection of XML tools written specifically to take advantage of the XNI API in Xerces2 including the NekoHTML parser and the NekoDTD parser. This release is now compatible with all versions of Xerces2 through 2.2.0. This release also adds support for processing instructions in HTML and fixes various bugs.

Sunday, September 29, 2002

RealNetworks, Inc. has submitted a note describing the eXtensible Media Commerce Language (XMCL) to the W3C. XMCL "is an interchange format that describes usage rules that apply to multi-media content. It is designed to communicate these rules in an implementation independent manner for interchange between business systems and DRM implementations responsible for enforcing the rules described in the language."

An XMCL document "describes the minimum, self-complete set of business rules under which digital media is licensed for consumer use. These business rules support multiple business models including rental, subscription, ownership, and video on demand/pay-per-view. When a business system authorizes a customer transaction for digital media, it generates a XMCL document that is then acted upon and enforced by a specific trusted system. The generated XMCL document is submitted to the trusted system through the APIs of the trusted system (e.g. HTTP POST, RPC call, API call).". For example, here's an XMCL document from the note that describes a movie rental:

<xmcl>
  <license> 
    <contentInfo>
      <contentID type="GUID">
        13AC7DE5-8028-42fe-95CE-0DC2221891C7
      </contentID>
      <ds:KeyInfo xmlns:ds="http://www.w3.org/2000/09/xmldsig#">
      <ds:KeyName>ContentKey</ds:KeyName>          
      <ds:KeyValue>
        <key algorithm="urn:nist-gov:tripledes-ede-cbc">
         3812A419C63BE771 AD9F61FEFA20CE63 3812A419C63BE771
        </key>
      </ds:KeyValue>
    </ds:KeyInfo>
    <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
      xmlns:dc="http://purl.org/dc/elements/1.1/">
      <rdf:Description>
        <dc:title>First Blood</dc:title>
        <dc:subject>
          movie, action, adventure
        </dc:subject>
      </rdf:Description>
    </rdf:RDF>
  </contentInfo>
  <validPeriod start="2001614T184300"
      end="2001621T184300"/>
  <usageRights>
     <useDuration length="24h" begin="firstUse"/>
  </usageRights>
 </license>
</xmcl>

IPR Systems has submitted an apparently competing proposal called Open Digital Rights Language (ODRL) Version 1.1. According to its abstract, ODRL is a "language for the Digital Rights Management (DRM) community for the standardisation of expressing rights information over content. The ODRL is intended to provide flexible and interoperable mechanisms to support transparent and innovative use of digital resources in publishing, distributing and consuming of electronic publications, digital images, audio and movies, learning objects, computer software and other creations in digital form."

Saturday, September 28, 2002

Centerpoint has released CenterPoint|XML 2.1.3, a C++ class library that supports SAX 2, DOM1, and DOM2. Centerpoint|XML is written in Standard C++, uses the C++ Standard Library and works on many platforms, including various Solaris, HP-UX, Mac OS X, Linux, Microsoft Windows NT/2000/XP, and OpenVMS. CenterPoint|XML is based on expat 1.2 which is included.

Friday, September 27, 2002

Opera Software has posted the first beta of Opera 6.0 for the Macintosh. New features include a zoom bar, OperaShow for desktop presentations, Unicode support, tabbed browsing, skins, and inline searching. Mac OS 9.1 or later is requirred. Opera is $39 paware/free-beer adware (your choice).


The Apache XML Project has released version 2.2.0 of Xerces-J, an open source XML processor written in Java. Version 2.2.0 makes two API-level changes to the Xerces Native Interface have been made in this release, improves performance, and fixes assorted bugs.


Eric van der Vlist has posted xvif 0.2.0, an XML Validation Interoperability Framework. This release includes a "partial implementation of Relax NG and a very partial implementation of W3C XML Schema datatypes." xvif is written in Python.

Thursday, September 26, 2002

The W3C Technical Architecture Group (TAG) has concluded that HLink is not consistent with the Web architecture. It's not clear whether the TAG has the authority to veto anything (though one of its members does), but nonetheless this is a big vote against HLink. Personally, I suspect this is correct. HLink is messy and confusing, and adds nothing significant to XHTML. Either old-style links or simple XLinks with some links moved to child elements seem fully capable of meeting XHTML's needs. I don't see the problem HLink is trying to solve as anythign that actually needs solving.


The W3C XML Protocol Working Group has published a last call working draft of the SOAP 1.2 Attachment Feature. According to the abstract, "This document defines a SOAP feature that represents an abstract model for SOAP attachments. It provides the basis for the creation of SOAP bindings that transmit such attachments along with a SOAP envelope, and provides for reference of those attachments from the envelope. SOAP attachments are described using the notion of a compound document structure consisting of a primary SOAP message part and zero or more related documents parts known as attachments."

Wednesday, September 25, 2002

The W3C XML Core Working Group has posted the second candidate recommendation of XML Inclusions (XInclude) Version 1.0. The syntax is basically the same as in the last draft. However, now an XInclude procesor is allowed to claim full conformance even if it only supports the element() scheme of XPointer. Support for full, XPath based XPointers is optional. In addition, namespace fixup has been changed. In particular, it is suggested but not required that implementations not include namespace attributes in the merged document. However, implementations are reuired to preserve in-scope namespaces, which can only be done in most cases via namespace attributes. Catch-22. Comments are due by November 1.


Pyana 0.6.0 is an extension module that allows Python programs to access Xalan 1.4.


Pekka Enberg's posted version 0.1.5 of XML Indent, an open source (GPL) "XML stream reformatter written in ANSI C" that "is analogous to GNU indent." This release doesn't try to indent CDATA sections.

Tuesday, September 24, 2002

Here's a quick prerelease shot of the cover for Processing XML with Java:

Yes, I know DOM is mentioned twice. That will be fixed before the book is printed. Still and all, it looks pretty cool, doesn't it? Amazon is taking pre-orders. The printed book should be available in mid-November.


The Mozilla Project has released Phoenix 0.1, a lightweight browser that's based on Mozilla but is just a browser, no e-mail program, no news reader, no kitchen sink. This is good news for those of us who love the Mozilla browser, but use different e-mail and new programs. Phoenix differs from similar efforts like Galeon in that it's based on XUL and is designed to eventual cross-platform release. However, version 0.1 is only available on Windows and X86 Linux. Here's one call for a Mac version.

Monday, September 23, 2002

The XML Apache Project has released Xalan-C++ 1.4, an open source XSLT processor written in standard C++. Version 1.4 fixes bugs, adds a built for 64-bit HP/UX Redhat 7.2 with gcc 3.1, provides early implementations of a number of EXSLT functions, and a "new sample application that illustrates how to perform transformations with input in the form of a pre-built XalanDOM or XalanSourceTree." Xerces-C++ 2.1 is required for this release.


I've posted version 1.0d3 of XOM. This release refactors the node types with the net effect of removing 36 methods from various classes, thus further simplifying the API. The TreeNode class is gone, replaced by new ParentNode and a now public LeafNode class. Furthermore, all navigation methods are in Node, and all insertion and deletion methods are in ParentNode.

Saturday, September 21, 2002

I've posted version 1.0d2 of XOM. This release fixes the first bugs discovered, cleans up the source code, improves the JavaDoc, and makes a few changes to method names that seemed wise. API-level changes since Tuesday night include:

  • readAttribute is now getAttributeValue
  • howManyChildren is now getChildCount
Friday, September 20, 2002

IBM's alphaWorks has released the XML Integrator, a "tool for bi-directional data conversion between XML and structured data formats such as relational or LDAP data. This tool externalizes the specification of the mapping between XML and relational databases, and it replaces the programming effort by the simpler effort of writing a script that describes the relationships between the XML constructs and the corresponding RDBMS constructs. XI can be used as a stand-alone utility, or it can be integrated as a library in other applications."


Daniel Cazzulino's announced Schematron.NET 0.51, an open source Schematron validator based on .NET and written in C#. It uses XPath-only features, and understands embedded schemas.


Jochen Wiedmann's released JaxMe 1.4.5, "yet another open source Java/XML binding tool in the style of Castor or Zeus" that sits on top of SAX2. JaxMe provides code generators that read a W3C XML schema and generate code for parsing conformant XML documents into corresponding Java objects, saving those objects into a database or reading those Java objects from a database and converting them into XML. JaxMe supports SQL databases and Tamino. It includes an integrated application framework and a generator for EJB entity beans with bean managed persistence (BMP). It's based on a reduced subset of the W3C XML schema language that does not support choices, references, or recursion. This release fixes a few bugs.

Thursday, September 19, 2002

Eric S. Raymond has released version 1.0.1 of doclifter, a tool that transcodes {n,t,g}roff documentation to DocBook. He claims the "result is usable without further hand-hacking about 95% of the time." This release fixes a bug with entity resolution. Doclifter is written in Python, and requires Python 2.2a1. doclifter is published under the GPL.


Norm Walsh has released DocBook Website 2.3, a DocBook customization layer that allows you to author entire web sites in valid DocBook XML. Version 2.3 adds support for RSS, and is based on DocBook 4.2.


Pekka Enberg's XML Indent 0.1 is an open source (GPL) "XML stream reformatter written in ANSI C. It is analogous to GNU indent."

Wednesday, September 18, 2002

Last night at the New York XML SIG meeting, I unveiled XOM, a new XML Object Model for Java published under the LGPL. Like DOM, JDOM, dom4j, and ElectricXML, XOM is a read/write API that represents XML documents as trees of nodes. Where XOM diverges from these models is that it strives for absolute correctness and maximum simplicity. XOM is based on more than two years' experience with JDOM development, as well as the last year's effort writing Processing XML with Java. While documenting the various APIs I found lots of things to like and not like about all the APIs, and XOM is my effort to synthesize the best features of the existing APIs while eliminating the worst. It's closest in spirit to JDOM. I had originally intended to fork JDOM, but it rapidly became apparent that very little actual JDOM code would be left when I was done and that starting from scratch would give me a flexibility I wouldn't have by using the existing JDOM code base. There is one non-public class hidden deep in the bowels of the API that uses some JDOM code (taken from, not coincidentally, the single JDOM class I was most personally responsible for) but otherwise, XOM was written starting with a blank screen.

I also attempted to write XOM according to a more modern understanding of Java in particular and API design in general than I had two years ago when JDOM was started. I've spent a lot of time over the last year thinking about the ideas of (and in some cases arguing with) Bruce Eckel, Joshua Bloch, Bertrand Meyer, Ken Arnold, Erich Gamma, Kent Beck, Martin Fowler, and other design savants, sometimes in person, sometimes in e-mail, sometimes just in my own head. It may not be obvious at first glance, but these gurus collectively had a huge effect on the overall design of the API. For instance, Joshua Bloch's Effective Java gave me the courage to ignore the Cloneable interface and give my classes copy constructors instead. Bloch and Bruce Eckel together convinced me that many of the exceptions in XOM deserved to be runtime exceptions, not checked exceptions, a decision that makes a lot of code much cleaner.

The actual release date snuck up on me about a week earlier than I was expecting, and although the software was ready, not all the supporting documentation, mailing lists, CVS repository, web servers, etc. was, so I'm going to be filling a lot of that in today. Update: The XOM-interest mailing list is now live. You can subscribe by sending e-mail to xom-interest-request@lists.ibiblio.org with just the word "subscribe" in the subject or body. or by filling in this form. This is a general discussion and development mailing list. For now, everyone's invited. Once XOM is more stable, I'll probably split out separate xom-development and xom-users lists, but it's too early in XOM's life cycle right now for the distinction to matter that much. The current web site on Cafe con Leche is temporary. I will eventually move it to its own domain at www.xom.nu, but I'll be sure to leave an automatic redirect behind when that happens.

In the meantime, if you're curious you can start by browsing the JavaDocs or looking over the notes from last night's presentation to the New York XML SIG. I consider the current version to be 1.0d1. In other words, the API is still open for discussion and change. Depending on what people think, it could take more or less time to reach an API freeze and begin an alpha and beta cycle. I do very much want to hear feedback. I'm going to try to get some mailing lists set up very quickly so we can have an ongoing discussion and back and forth, but that may take me a day or two. In the meantime, please keep good notes on any comments you have. :-)

Tuesday, September 17, 2002

Altova has released XMLSpy 5.0, a $990 payware XML editor for Windows. (If you want support, it will cost you $198 more.) New features in this release include

  • XML Schema driven Code Generation
  • XSLT Debugging
  • WSDL Editor
  • Java and C++ code generation
  • HTML site importing
  • Tamino native XML database Integration
  • Templates for DocBook, US Patent & Trademark Office "RedBook", Information Text Format (NITF), and News Markup Language (NewsML)
  • Spell-checking
  • Toolbar customization

Norm Walsh has released version 1.55.0 of his XSLT stylesheets for DocBook. I use these to generate both XSL-FO and HTML files for Processing XML with Java. Changes, new features, and bug fixes in this release include:

  • Lithuanian and Vietnamese localizations
  • Support orientation, rotated-width, and rotated-height on a processing instruction to rotate table cells in FO output
  • Restart all books on page 1
  • Experimental "chunkfast" support
  • Left-align monospaced verbatim environments in FO (finally!)

Walsh has also posted the second beta of DocBook Slides 3.0, a DocBook module for presentations based on DocBook XML 4.2. This release fixes a few bugs.


Aleksey Sanin's posted version 0.0.9 of his XML Security Library, an open source C library based on libxml2 and OpenSSL that supports

  • XML Signature
  • XML Encryption
  • Canonical XML
  • Exclusive Canonical XML

Sun's posted public draft 0.7 of the Java Architecture for XML Binding (JAXB) specification in the Java Community Process. JAXB compiles an XML schema into one or more Java classes. (First mistake: JAXB assume there's a schema. Second mistake: It assumes the schema is written in the W3C XML Schema Language. Third mistake: It assumes documents actually adhere to the schema.) JAXB can unmarshal schema-valid XML into Java objects; read, update and validate the Java objects against the schema, and write the result back out as XML. Changes since the last 0.21 release include:

  • Support for a subset of W3C XML Schema and XML Namespaces
  • More flexible unmarshalling and marshalling functionality
  • Validation process enhancements

Comments are due by October 16.

Monday, September 16, 2002

The W3C HTML Working Group has published a working draft of HLink: Link recognition for the XHTML Family. The basic idea is to add an hlink element to the head that identifies the type of link attributes on particular elements in the document. For example, the draft gives this hlink element as a definition for the classic a link:

<hlink namespace="http://www.w3.org/1999/xhtml"
  element="a"
  locator="@href"
  effect="replace"
  actuate="onRequest"
  replacement="@target"/>

Hlink spits in the face of XLink. It starts the whole linking process over from scratch with a completely incompatible syntax. It is based on some deep problems the HTML working group has with XLink. In particular,

  • The HTML working group is unwilling to consider extended links. They insist on including multiple link attributes on single elements.
  • The HTML working group does not like the semantics XLink defines.

Because of the ongoing sniping between the XLink and HTML camps, this draft has a bit of a polemical feel to it. If you're not up on the intricacies of W3C politics, I recommend you keep a few things in mind while reading this:

  • This draft makes several mistakes in interpreting the semantics of XLink attributes. Do not trust what it says XLink can and cannot or does or does not mean.

  • This draft's XLink solutions are a lot uglier and more verbose than they would have to be in practice. Many of the examples could be reduced to a single xlink:href attribute, which would significantly reduce the "Yuck" factor that led to HLink in the first place.

  • Any version of HTML that incorporates either XLink or HLink will be incompatible with existing browsers and systems, completely aside from linking issues. For instance, the img element is being eliminated in XHTML 2.0 and frames are rewritten from scratch. Given this fact, backwards compatibility is not really a selling feature for either XLink or HLink.

I haven't yet made up my mind about this. There are some interesting ideas here. Just read the draft with a large bag of salt close at hand.

Sunday, September 15, 2002

Elharo.com and Macfaq.com will be down today while I try to upgrade the OS. I hope they'll be up again tomorrow.


Design Science has released MathPlayer 1.0, a free-beer MathML plug-in for Internet Explorer on Windows. I'm told it supports both content and presentation markup. Web pages need to include special elements and processing instructions to use MathPlayer; for example,

<head>
   <OBJECT ID=behave1 CLASSID="clsid:32F66A20-7614-11D4-BD11-00104BD3F987"></OBJECT>
   <?IMPORT NAMESPACE="M" IMPLEMENTATION="#behave1" ?>
   <title>Page with Math</title>
</head>

MathPlayer cannot display an arbitrary page containing some MathML. This is not how MathML is supposed to work. This seems likely to be due to limitations in the IE plug-in API.

Saturday, September 14, 2002

Norm Walsh has posted three candidate releases of DocBook modules:

  • Simplified DocBook V1.0CR3, a customization layer that reduces DocBook XML 4.2 to a more manageable size. (Previous version were based on DocBook 4.1.)
  • DocBook EBNF V1.1CR1, an extension of DocBook 4.2 for Extended Backus-Naur Form Grammars, such as used in many W3C specifications including XML 1.0.
  • DocBook SVG V1.0CR1, an extension of DocBook XML 4.2 based on the Candidate Recommendation of SVG 1.1

DocBook is an XML and SGML application for narrative, technical documents like used for much of the Linux Documentation Project and my own Processing XML with Java.

Friday, September 13, 2002

Sun's posted a maintenance release of Java Specification Request 63, the Java API for XML Processing 1.2. It's not immediately obvious to me what has changed.

Thursday, September 12, 2002

The first alpha of Mozilla 1.2 has been posted for the usual batch of platforms. New features include "Type Ahead Find" and a pretty printed raw XML view, like that found in Internet Explorer. XML pretty printing is only available in the .zip distribution and is turned off by default because it affects the DOM for unstyled XML-pages. To turn it on , add user_pref("layout.xml.prettyprint", true); to your user.js file.

Wednesday, September 11, 2002

Mozilla 1.0.1 has been released for Mac OS, Windows, OpenVMS, Solaris, and Linux. This release fixes over 600 assorted bugs though sadly not the one bug in AppleScript that's keeping me from using it on my Mac. Mozilla has the best support for XML of any browser on the market today. Standards it supports include HTML 4.0, XML 1.0, the Resource Description Framework (RDF), Cascading Style Sheets Level 1 (CSS1) and Level 2 (CSS2), the Document Object Model Level 1 (DOM1) and Level 2 (DOM2), and XHTML. The entire user interface is written in XUL, the XML User Interface language. Java is supported via Sun's Java plug-in. On top of that, it lets you set the search function to Google and turn off pop-up adds. Mozilla has been my primary browser for the last year or so on Windows and Linux.


IBM has released Version 5.0 of XML for C++, a schema-validating XML parser based on Xerces-C 2.1. It adds 64-bit binaries, grammar pre-parsing and grammar caching, experimental DOM Level 3 support, International Classes for Unicode 2.2, and various bug fixes.

Tuesday, September 10, 2002

First, go read Bill Venners' interview with Ken Arnold quoted above. Then mark your calendar for Tuesday September 17, because I've been thinking along similar lines for a while, and one week from tonight, at at the New York XML Special Interest Group meeting in Manhattan, I'll be unveiling a new XML API that follows a lot of the principles Arnold outlines in his interview, including:

  • Give developers what they need, not what they want.
  • Simplicity counts.
  • Human factors matter in API design.
  • It's OK to raise complexity for implementers in order to lower complexity for users.
  • A library should make it easy to do the right thing and difficult to impossible to do the wrong thing.

I think most existing XML APIs violate one or more of these rules. DOM violates all of them, and quite a few more besides. I believe something better is possible, and on the 17th I intend to prove it. If you're curious, and you'd like to be in the audience, just drop a note to Walter Perry to reserve a spot. The meeting begins at 7:00 P.M. at the Goldman Sachs Training Center, 125 Broad Street, in lower Manhattan. Security requires that those attending this meeting be registered at least a day in advance so that their names are available to check against attendance at the door. Please register before Monday 16 September to insure that you will be admitted.

P.S. If you can't be in New York on the 17th, I'll be posting everything online here on the 18th. Stay tuned.

Monday, September 9, 2002

Topologi has released the Topologi Collaborative Markup Editor 1.0.1, a $60 payware source code-level editor for XML and SGML that runs on Windows NT 4.0 and later and supports most of Unicode (with the exception of combining characters, and subject to font availability). Other features include database import, table views, and open and save to FTP servers.

Sunday, September 8, 2002

CMP has posted the Call for Papers for Software Development 2003 West. This conference will take place March 24-28, 2003, in Santa Clara. (Note: the dates currently listed on the web site are wrong.) Once again I'll be chairing the XML track. XML-wise this conference tends to focus on more practical, how-to sessions rather than a typical XML conference, which runs more advanced and theoretical sessions.

We're looking for sessions that speak to programmers who are not necessarily XML experts but need to learn how to use SAX, DOM, namespaces, XSLT, schemas, etc. We're not as focused on bleeding edge topics and research projects as the more XML-specific shows. We're looking for ninety minute seminars, and half-day and full-day tutorials. If you haven't presented at this show before, you're much more likely to get picked for one or two ninety minute sessions than for a half or full-day session. We like to get to know speakers with a smaller session first.

Saturday, September 7, 2002

The W3C HTML Working Group has posted a note about XHTML™ 1.0 in XML Schema. "This document provides informative XML Schemas for XHTML 1.0."

Friday, September 6, 2002

The W3C RDF Core Working Group has posted the first public working draft of the Resource Description Framework (RDF): Concepts and Abstract Data Model. According to their abstract:

The Resource Description Framework (RDF) is a data format for representing metadata about Web resources, and other information. This document defines the abstract graph syntax on which RDF is based, and which serves to link its XML serialization to its formal semantics. It also describes some other technical aspects of RDF that do not fall under the topics of formal semantics, XML serialization syntax or RDF schema and vocabulary definitions (which are each covered by a separate document in this series). These include: discussion of design goals, meaning of RDF documents, key concepts, character normalization and handling of URI references.

Andy Clark's posted a new release of his CyberNeko Tools for the Xerces Native Interface (NekoXNI). This is a collection of XML tools written specifically to take advantage of the XNI API in Xerces2 including the NekoHTML parser and the NekoDTD parser. This release is now compatible with Xerces 2.1.0 and fixes a bug in NekoPull that prevented processing instructions from being propagated.

Thursday, September 5, 2002

The W3C Core XML Working Group has published the last call working draft of Namespaces in XML 1.1. The main changes since Namespaces 1.0 are:

  • IRIs are used instead of URIs. (IRIs can use non-ASCII characters like Θ without % escaping them.)
  • The default namespace can be unset by using an xmlns="" attribute.
  • The prefix xmlns is by definition bound to the namespace name http://www.w3.org/2000/xmlns/.

However, all of this only applies to XML 1.1 documents. XML 1.0 documents must use Namespaces 1.0. Comments are due by September 28.


Arabica (nee SAXinC++) is an open source C++ XML parser toolkit that supports SAX2 and DOM2 by wrapping an underlying parser such as expat, Xerces, libxml, or the Microsoft XML parser COM component. It supports various string types. It is published under a BSD style license.


Eric van der Vlist has posted version 0.1.3 of the XML Validation Interoperability Framework (xvif), "a proposal for embedding pipes of transformations and validations within grammar based schema languages." In contrast with document based piping approaches, this focuses on "micro-pipes" that operate on individual information items such as attributes, text nodes, elements and so forth. The prototype is based on Relax NG and Python. This release adds "a basic yet general mechanism to plug in datatype libraries" and fixes a couple of bugs.

Wednesday, September 4, 2002

The XML Apache Project has released version 2.4.0 of Xalan-J, an open source XSLT 1.0 processor written in Java. New features in this release include:

  • Xerces-2 support
  • EXSLT extensions
  • Better extension handling overall
  • Various performance improvements and bug fixes
  • XSLTC 1.2

Microsoft has released version 1.0 of the Microsoft Xml Diff and Patch tool. This is a set of Windows tools written in C# that can compare two XML documents and produce a "Diffgram" describing the differences between the two XML documents. The XML Patch tool can use the diffgram to update copies of the original document. XML Diff performs partially XML-based comparison of the XML documents as opposite of a common lexical comparison. For instance, it ignores the document encoding and the order of attributes. However, it does distinguish between empty element tags and empty elements with two tags. This is a bug. Update: It seems that the tool actually does treat empty element tags and empty elements with a start and end-tag the same, even though the documentation claims exactly the opposite.

Tuesday, September 3, 2002

The W3C Technical Architecture Group has published the first public Working Draft of Architectural Principles of the World Wide Web. This document attempts to lay out the basic principles that underlay or should underlay what we normally call the Web. It describes some limitations on protocols, formats, and resources.

The main purpose is to inform development of future specifications and technologies such as SOAP, XHTML 2.0, and XForms so that different specs don't collide with each other or with the substructure of the Web. Past examples of the sorts of things this effort is trying to prevent in the future include SOAP's firewall tunnelling, the unbookmarkability of HTML frames, and RDF's deep schizophrenia about URIs and resources. The goal is to bring some order to the current morass of often mutually contradictory specifications. In many ways this reminds me of the Infoset's effort to synchronize the different XML specifications. In the end, the need for backwards compatibility meant it only added to the confusion with yet another data model. It never successfully subsumed and unified the existing data models of different specs like XPath and DOM. This effort is even more ambitious, covering not just XML but the entire universe of Web related technology.


Norm Walsh has published version 1.54.1 of the DocBook XSL Stylesheets. I'm using these for Processing XML with Java. This version makes many small improvements and bug fixes, especially in indexing and tables.

Egon Willighagen has released JReferences, "a BibTex like system for DocBook XML." He's also written a LinuxFocus article about it. "JReferences uses a file database backend with references and refers to IDs to those references. The actual bibliography is autogenerated and citations in the article are autonumbered."

Monday, September 2, 2002

The XML Apache Project has posted the fourth beta of Batik 1.5, an open source Scalable Vector Graphics (SVG) renderer written in Java. This release fixes many bugs and supports most of SVG 1.0.

Sunday, September 1, 2002

Netscape 7.0 has been released for the usual batch of platforms (Linux, Mac OS, Mac OS X, Windows 95/98/NT/2000/XP, etc.). Netscape supports XML, XSLT, CSS, XHTML, and more. This is basically the same browser as Mozilla 1.0.1, except with more advertising. New features (few of which are new to Mozilla users) include:

  • Tabbed Browsing
  • Bookmark Groups
  • Click-to-Search
  • Download Manager
  • Web Site Icon (Favicon) support
  • Save whole web pages
  • Full Screen Mode
  • Drag and drop bookmarks
  • Print Preview
  • P3P (Privacy Preference Project) support
  • Update Notification
  • Return receipts in Netscape Mail
  • More email import formats
  • S/MIME
  • Shared folders for IMAP
  • Secure LDAP
  • Offline LDAP.
  • Exporting contacts
  • AOL Instant Messenger Buddy (AIM) Icons
  • AIM Buddy Alerts: Set up sound and window alerts when specific buddies come
  • AIM File Transfer
  • Server-side Buddy List storage: Lets you access you Buddy List from any computer.
  • Integrated ICQ
  • Sign On at Launch
  • One-button publishing in Netscape Composer.
Thursday, August 29, 2002

I'm travelling for the next couple of days so there probably won't be any more updates until Sunday.


Hot on the heels of yesterday's Xerces-C release, The Apache XML Project has released version 2.1 of Xerces-J, an open source XML parser written in Java. Version 2.1 mostly fixes assorted bugs. It also adds a few features to the Xerces Native Interface (XNI) and the Post Schema Validation Infoset (PSVI) API. Finally it adds experimental implementations of the DOM Level 3 DOMBuilderFilter and DOMWriterFilter interfaces.

Wednesday, August 28, 2002

The Apache XML Project has released version 2.1 of Xerces-C, an open source XML parser written in reasonably portable C++. Version 2.1 fixes assorted bugs, adds some more experimental support for DOM Level 3 (compareTreePosition, lookupNamespaceURI, lookupNamespacePrefix, and isDefaultNamespace), and bundles IA64 binaries for Windows and Linux.


Michael Kay's released version 7.2 of Saxon, an experimental XSLT 2.0 processor written in Java. This release adds support "for regular expressions, specifically the xsl:analyze-string instruction and the matches(), replace(), and tokenize() functions defined in the 16 August 2002 working drafts from W3C." Version 1.2 requires Java 1.4 or later. According to Kay, "Production systems should stick with Saxon 6.5.2 until the XSLT 2.0 and XPath 2.0 specs become more stable." Most users should not upgrade.

Tuesday, August 27, 2002

Version 1.1 of Mozilla has been released, a little to my surprise. Version 1.0 took so long that I wasn't expecting this till next year. I guess the Mozilla folks have finally gotten their momentum going. For anyone who's been in a cave for the last few years, Mozilla is the open source web browser/news reader/e-mailer that supports HTML, XHTML, CSS, XML, XSLT, MathML, JavaScript, and lots more. It has a very nice skinnable user interface with convenient features like turning off pop-up ads and tabbed windows. New features in version 1.1 include:

  • Full-screen mode on Linux
  • MathML support on the Mac
  • View source of selection
  • Display HTML mail as plaintext
  • Better bidirectional Arabic and Hebrew support
  • XBM image support
  • Image and plug-in blocking for Mail & News
  • Quartz rendering for Mac OS X
  • Many assorted bug fixes, rendering improvements, and speedups

It's my browser of choice on Windows and Linux, and I'm just waiting for one obscure AppleScript bug to get fixed before I make the switch to Mozilla on the Mac.


IBM's alphaWorks has updated their XML Schema Quality Checker. This tool reads a schema "written in the W3C XML schema language and diagnoses improper uses of the schema language." This release uses Xerces-2, adds some extra command line options, and integrates with Eclipse.

It also implements the "XML Schema 1.0 Specification Errata", which I note the schema working group has finally published.

Monday, August 26, 2002

Daniel Veillard's released version 1.0.20 of libxslt, the GNOME XSLT library and version 2.4.24 of libxml2, the GNOME XML parser for Linux. The new version of libxslt fixes assorted bugs, especially those that were interfering with the DocBook XSLT stylesheets. The new version of libxml improves canonicalization and XInclude support as well as fixing some bugs in XPath, the Python bindings, and the HTML serializer.


Friday, August 23, 2002

The W3C Web Content Accessibility Guidelines Working Group has published the first public working draft of Web Content Accessibility Guidelines 2.0. Quoting from the abstract:

W3C published the Web Content Accessibility Guidelines 1.0 (WCAG 1.0) as a Recommendation in May 1999. This Working Draft for version 2.0 builds on WCAG 1.0. It has the same aim: to explain how to make Web content accessible to people with disabilities and to define target levels of accessibility. Incorporating feedback on WCAG 1.0, this Working Draft of version 2.0 focuses on checkpoints. It attempts to apply checkpoints to a wider range of technologies and to use wording that may be understood by a more varied audience.

The W3C User Agent Accessibility Guidelines Working Group has revised two working drafts, on User Agent Accessibility Guidelines 1.0 and Techniques for User Agent Accessibility Guidelines 1.0. The first document "provides guidelines for designing user agents that lower barriers to Web accessibility for people with disabilities (visual, hearing, physical, cognitive, and neurological)." The second "provides techniques for satisfying the checkpoints defined in 'User Agent Accessibility Guidelines 1.0'".


Thomas Weber's open source txt2docbook 0.8 is a Perl program that converts ASCII files to valid DocBook documents.

Thursday, August 22, 2002

The W3C HTML Activity has posted the second Last Call Working Draft of XForms 1.0. XForms are a new XML application that succeeds HTML forms, and will be used in XHTML 2.0. According to the draft introduction,

The primary difference when comparing XForms with HTML Forms, apart from XForms being in XML, is the separation of the data being collected from the markup of the controls collecting the individual values. By doing this, it not only makes XForms more tractable by making it clear what is being submitted where, it also eases reuse of forms, since the underlying essential part of a Form is no longer irretrievably bound to the page it is used in.

A second major difference is that XForms, while designed to be integrated into XHTML, is no longer restricted only to be a part of that language, but may be integrated into any suitable markup language.

XForms has striven to improve authoring, reuse, internationalization, accessibility, usability, and device independence.

For example, if I were to translate the subscription form on the mailing list page to XForms, I would first place the following content in the header of the page, specifying what data is collected and sent to the server:

<xforms:model id="devxml"
 xmlns:xforms="http://www.w3.org/2002/08/xforms/cr"
 xmlns:ml="http://namespaces.cafeconleche.org/mailinglists">
  <xforms:instance>
    <ml:subscription>
      <ml:username />
      <ml:realname />
      <ml:listaddress>dev-xml-subscribe@onelist.com</ml:listaddress>
      <ml:listtype>listserv</ml:listtype>
      <ml:listname>dev-xml</ml:listname>
    </ml:subscription>
  </xforms:instance>
  <xforms:submission action="/javafaq/cgi-bin/xmlmaillist.pl" method="post" id="subscribe" />
</xforms:model>

The body of the document would then contain form controls like these that provide data for particular elements defined in the model. XPath expressions identify which elements and attributes get filled in with data from the form:

<xforms:input ref="my:username" model="devxml">
  <xforms:label>E-mail address</xforms:label>
</xforms:input>
<xforms:input ref="my:realname" model="devxml">
  <xforms:label>Name</xforms:label>
</xforms:input>
<xforms:submit submission="subscribe">
  <xforms:label>Submit</xforms:label>
</xforms:submit>

Benefits include

  • Strong schema-based typing for data validation
  • Forms are submitted as XML documents that can easily be processed by XML tools. In fact, with a little effort it's not hard to make a form a SOAP or XML-RPC request.
  • Easier internationalization
  • Enhanced accessibility
  • Greater device independence
  • Less scripting

Comments are due by September 4.

Wednesday, August 21, 2002

BEA, Sun, SAP and Intalio have submitted a note to the W3C proposing a Web Service Choreography Interface (WSCI). According to the abstract,

The Web Service Choreography Interface (WSCI) is an XML-based interface description language that describes the flow of messages exchanged by a Web Service participating in choreographed interactions with other services.

WSCI describes the dynamic interface of the Web Service participating in a given message exchange by means of reusing the operations defined for a static interface. WSCI works in conjunction with the Web Service Description Language (WSDL), the basis for the W3C Web Services Description Working Group; it can, also, work with another service definition language that exhibits the same characteristics as WSDL.

WSCI describes the observable behavior of a Web Service. This is expressed in terms of temporal and logical dependencies among the exchanged messages, featuring sequencing rules, correlation, exception handling, and transactions. WSCI also describes the collective message exchange among interacting Web Services, thus providing a global, message-oriented view of the interactions.

WSCI does not address the definition and the implementation of the internal processes that actually drive the message exchange. Rather, the goal of WSCI is to describe the observable behavior of a Web Service by means of a message-flow oriented interface. This description enables developers, architects and tools to describe and compose a global view of the dynamic of the message exchange by understanding the interactions with the web service.

Tuesday, August 20, 2002

The W3C Web Services Architecture Working Group has published the second public working draft of Web Services Architecture Requirements. According to the abstract,

The use of Web services on the World Wide Web is expanding rapidly as the need for application-to-application communication and interoperability grows. These services provide a standard means of communication among different software applications involved in presenting dynamic context-driven information to the user. In order to promote interoperability and extensibility among these applications, as well as to allow them to be combined in order to perform more complex operations, a standard reference architecture is needed. The Web Services Architecture Working Group at W3C is tasked with producing this reference architecture.

This document describes a set of requirements for a standard reference architecture for Web services developed by the Web Services Architecture Working Group. These requirements are intended to guide the development of the reference architecture and provide a set of measurable constraints on Web services implementations by which conformance can be determined.

My favorite part of this document is that it actually defines what the heck a web service is:

Definition: A Web service is a software application identified by a URI, whose interfaces and bindings are capable of being defined, described, and discovered as XML artifacts. A Web service supports direct interactions with other software agents using XML based messages exchanged via internet-based protocols.

In the past, I've noticed that how a web service is defined often depends on what a vendor is trying to sell me. Notably absent from this definition is any requirement to use HTTP.

Monday, August 19, 2002

XML.com has published my latest article, The XMLPULL API, a brief tutorial on the XMLPULL API. Capsule summary: pull parsing is a useful style of XML processing, but the existing Java implementations and APIs are severely flawed.


Eric S. Raymond has released doclifter 1.0.0, a tool that transcodes {n,t,g}roff documentation to DocBook. He claims the "result is usable without further hand-hacking about 95% of the time." Doclifter is written in Python, and requires Python 2.2a1. doclifter is published under the GPL.


Syncro Soft has released version 1.2.2 of <oXygen/>, a $65 payware XML editor written in Java that can run as an applet. <oXygen/> 1.2 supports XSLT and XSL-FO, among other features. Version 1.2.2 adds support for xsi:type attributes, read only tags, external parameters for XSLT, headers and footers for XSLT results, proxy server support, and various bug fixes.


Sunday, August 18, 2002

The W3C HTML Working Group has published the third working draft of Modularization of XHTML in XML Schema. This contains a W3X XML Schema Language schema for XHTML 1.1 that allows other vocabularies like SVG and MathML to be mixed in.


The W3C XML Protocol Working Group has published the first public working draft of SOAP 1.2 Attachment Feature. According to the abstract, "This document defines a SOAP feature that represents an abstract model for SOAP attachments. It provides the basis for the creation of SOAP bindings that transmit such attachments along with a SOAP envelope, and provides for reference of those attachments from the envelope. SOAP attachments are described using the notion of a compound document structure consisting of a primary SOAP message part and zero or more related documents parts known as attachments."

Saturday, August 17, 2002

The W3C XML Query and XSL Working Groups have released seven updated working drafts:

This release of XSLT 2.0 adds an xsl:analyze-string instruction that can process unmarked-up text by matching it against a regular expression. There's also a new unparsed-entity-public-id() function and a schema for XSLT 2.0 stylesheets.

This version of the XPath 2.0 specification adds more details about error handling and conformance. It explicitly specifies the precedence of operators is explicitly represented. It introduces new unordered and idiv operators and removes the assert, precedes, follows, and => operators. "Every built-in type now has a constructor function with the same name as the type," and it "is now possible to cast to a derived atomic type from xs:string or from any supertype of the derived type (facets are checked during casting)."

The XQuery draft changes everything the XPath 2.0 draft changes, and also introduces xmlspace and default collation declarations into the Query Prolog. The use cases draft seems to have removed the reference and functions use cases, but has added a number of use cases for strongly typed data. (Typing and the need for typed queries has been one of the biggest controversies in the XSLT 2/XPath 2/XQuery development).

Friday, August 16, 2002

Bertrand Delacrétaz has posted jfor 0.7.1, an open source XSL-FO to RTF converter. This release adds support for several additional formatting objects and properties including page footers, background-color, space-before, space-after, number-columns-spanned, and automatic vertical merging of table cells.


Opera Software has updated its namesake Opera web browser for Windows to version 6.0.5. This release fixes some security flaws in the SSL implementation and a few other miscellaneous bugs.

Thursday, August 15, 2002
XML in a Nutshell Book Cover

Safari's posted the second edition of XML in a Nutshell, so if you prefer to read your references online, you can rent it there. It's not absolutely clear what the incremental cost for this book would be, but the minimum Safari price is $9.99 per month, which buys you access to several books. The exact titles can be changed from month to month.

Wednesday, August 14, 2002

The answer to yesterday's question about how to permanently nuke a file or directory out of the CVS repository appears to be "manually login to the CVS server, and use Unix rm commands." I'm not sure if I can do that on SourceForge or not, but I'll give it a shot.


The W3C HTML Activity has published a revised working draft of XML Events, a module that "provides XML languages with the ability to uniformly integrate event listeners and associated event handlers with Document Object Model (DOM) Level 2 event interfaces" in order to associate behaviors with elements.


The Jakarta Apache Project has released Commons Digester 1.3, a package that:

lets you configure an XML -> Java object mapping module, which triggers certain actions called rules whenever a particular pattern of nested XML elements is recognized. A rich set of predefined rules is available for your use, or you can also create your own. Advanced features of Digester include:

  • Ability to plug in your own pattern matching engine, if the standard one is not sufficient for your requirements.
  • Optional namespace-aware processing, so that you can define rules that are relevant only to a particular XML namespace.
  • Encapsulation of Rules into RuleSets that can be easily and conveniently reused in more than one application that requires the same type of processing.
Tuesday, August 13, 2002

I've begin the process of moving my XIncluder project into SourceForge. This should enable me to make somewhat more frequent releases, and create some mailing lists for the project and perhaps XInclude in general. I've chosen the LGPL as the license. The current CVS tree includes one significant bug fix in SAXXIncluder since 1.0d9.

I'm still trying to decode the intricacies of SourceForge and CVS, and there are probably still a few bugs in the system. Two quick questions:

  1. Is there any way to delete a directory from the current tree?
  2. Is there any way to permanently delete a file from CVS? i.e. not just from the current tree, but from the history as well? My initial check-in added some Unix housekeeping files like .nautilus-xml, that really don't belong there.

If you know how to do either of these, please send me an e-mail. Thanks!

Monday, August 12, 2002

The W3C HTML and SVG working groups have jointly published an XHTML+MathML+SVG profile that combines XHTML 1.1, MathML 2.0, and SVG 1.1 to allow valid documents to combine XHTML, MathML and SVG .

Sunday, August 11, 2002

The W3C Web Ontology Working Group has published three working drafts describing OWL, "a semantic markup language for publishing and sharing ontologies on the World Wide Web":

OWL is derived from the DAML+OIL Web Ontology Language and builds upon the Resource Description Framework (RDF) and RDF's XML syntax. The OWL language enables applications to "understand the content of information instead of just understanding the human-readable presentation of content. OWL facilitates greater machine readability of web content than XML, RDF, and RDF-S support by providing a additional vocabulary for term descriptions."

Saturday, August 10, 2002

The W3C Cascading Style Sheets Working Group has posted the Candidate Recommendation of CSS TV 1.0. This profile "defines a subset of Cascading Style Sheets Level 2 and 'CSS3 module: Color' specifications tailored to the needs and constraints of TV devices."

The CSS Working Group has also updated the Candidate Recommendation of CSS mobile profile 1.0. This profile "defines a subset of the Cascading Style Sheets Level 2 specification tailored to the needs and constraints of mobile devices." The changes since the last draft are small, and include:

  • Alternative style sheets are now a "should" instead of a "must"
  • The '>' combinator (child selector) is now required
  • System fonts may be ignored

Ronan Oger's posted an alpha of YASB, an "open-source, cross-platform Perl/Tk SVG browser which supports simple SVG images." YASB only supports the drawing primitives at this point.

Friday, August 9, 2002

The W3C Cascading Style Sheets Working Group has published a working draft of Cascading Style Sheets, level 2 revision 1, that is, CSS 2.1. Unusually for a new spec, this goes backwards from the previous version. It focuses on removing properties from CSS2 rather than adding them. The impetus for removal is the failure of browser vendors to implement them. Features removed include:

  • Named counters
  • font-stretch
  • font-size-adjust
  • Aural style sheets
  • text-shadow

In addition, CSS 2.1 adds support for media-specific style sheets, content positioning, table layout, internationalization and some properties related to user interface. It also "corrects a few errors in CSS2 (the most important being a new definition of the height/width of absolutely positioned elements, more influence for HTML's 'style' attribute and a new calculation of the "clip" property)."

The CSS working group has also posted new working drafts of several CSS3 modules:

CSS3 module: Basic User Interface

According to the abstract, this working draft contains:

  • Pseudo-classes and pseudo-elements to style user interface states and element fragments respectively.
  • Additions to the user interface features in CSS2.
  • The ability to style the appearance of various standard form elements in HTML4 and properties to augment or replace some remaining stylistic attributes in HTML4.
  • Directional focus navigation properties.
  • A mechanism to allow the styling of elements as icons for accessibility.
CSS3 module: Backgrounds

This defines properties related to backgrounds including:

  • background-color
  • background-image
  • background-repeat
  • background-attachment
  • background-position
  • background-clip
  • background-origin
  • background-size
  • background-quantity
  • background-spacing
  • background

Last Call ends August 30.

CSS3 module: Web Fonts

This spec presents a set of properties allowing font description by a browser. "This specification is very close to the similar section in CSS 2 [CSS2]. Only errata have been applied." Last Call ends 30 August 2002

CSS3 module: Fonts

This defines properties for font specification and decoration including:

  • font-family
  • font-style
  • font-variant
  • font-weight
  • font-stretch
  • font-size
  • font-size-adjust
  • font
  • font-effect
  • font-smooth
  • font-emphasize-style
  • font-emphasize-position
  • font-emphasize

Last Call ends 30 August 2002

Thursday, August 8, 2002

Wolfgang Meier of the Darmstadt University of Technology has posted version 0.8 of eXist, an open source native XML database that supports fulltext search. XML can be stored in either the internal, native XML-DB or an external relational database. The search engine has been designed to provide fast XPath queries, using indexes for all element, text and attribute nodes. The server is accessible through HTTP and XML-RPC interfaces and supports the XML:DB API for Java programming. Version 0.8 enables the database engine to run either as a stand-alone server process, embedded into an application, or inside a servlet context. The XML:DB API implementation now "supports embedded as well as remote access to the database. Additional changes include a new SOAP interface, WebDAV integration (using Xincon), performance improvements, and many bugfixes."

Wednesday, August 7, 2002

Sun has released version 1.0_01 of the Java Web Services Developer Pack (Java WSDP). This pack bundles together minor updates to its various component technologies. It includes:

  • The Java XML Pack which includes:
    • Java API for XML Messaging (JAXM) v1.1_01
    • Java API for XML Processing (JAXP) v1.2_01
    • Java API for XML Registries (JAXR) v1.0_02
    • Java API for XML-based RPC (JAX-RPC) v1.0_01
    • SOAP with Attachments API for Java (SAAJ) v1.1_02
  • JavaServer Pages Standard Tag Library (JSTL) v1.0.1
  • Java WSDP Registry Server v1.0_02
  • Web Application Deployment Tool
  • Ant Build Tool 1.4.1
  • Apache Tomcat 4.1.2

The W3C HTML Working Group has posted the first public working draft of XFrames, an XML application for composing documents together in a single view (but not merging the infosets). The goal is to replace the already deprecated HTML frames. The spec defines seven elements:

  • frames
  • head
  • title
  • style
  • row
  • column
  • frame

For example, consider this document:

<row>
    <column>
        <frame/>
        <frame/>
    </column>
    <frame/>
</row>

It produces a layout like this:

 -------
|   |   |
|---|   |
|   |   |
 -------

The exact size of each frame is determined by CSS.

I've scrupulously avoided frames in my own work, so it's not obvious to me how this proposal differs from what we have now. In particular, I don't see how this fixes any of the existing problems with back buttons, bookmarks, search engines, or browsers with too little space for all the frames.

Tuesday, August 6, 2002

The W3C HTML Working Group has published the first working draft of XHTML 2.0 (not to be confused with the recently published second edition of XHTML 1.0). Changes include:

  • The applet tag is replaced with the object tag.
  • The img tag is replaced with the object tag.
  • Forms are replaced by XForms
  • Events are replaced by XML Events
  • Frames are replaced by the as-yet unreleased XFrames
  • All deprecated tags from HTML 4 are removed.
  • The href attribute can be attached to most elements so that any element can be a link.
Monday, August 5, 2002

XML.com has published Using XInclude, a brief introductory article I wrote that explains the difference between XInclude and external entity references, and shows you how to use XInclude to compose documents out of multiple well-formed XML documents.


The W3C XML Encryption working group has published revised candidate recommendations for Decryption Transform for XML Signature and XML Encryption Syntax and Processing. Comments are due by September 13.

Sunday, August 4, 2002

The W3C HTML Working Group has published the second edition of XHTML 1.0. Like XML 1.0, second edition, the second edition of XHTML 1.0 "is not a new version of XHTML 1.0 (first published 26 January 2000). The changes in this document reflect corrections applied as a result of comments submitted by the community and as a result of ongoing work within the HTML Working Group. There are no substantive changes in this document - only the integration of various errata." Surprisingly, this spec was released as a full recommendation, completely skipping the normal working draft, last-call working draft, candidate recommendation, proposed recommendation steps. I hope they got it right.

Tuesday, July 30, 2002

I'm traveling for the next few days. There probably won't be any updates until I get back on Monday.


DSTC's xs3p is an XSLT stylesheet that converts W3C XML Schemas into a nicely formatted, hyperlinked, XHTML document for convenient browsing.

Monday, July 29, 2002

Jiri Tobisek's XMLtype 0.5, a GPL'd, Linux console-based editor for narrative-oriented XML documents encoded in UTF-8. It supports bi-directional text. This release fixes various bugs.


Norm Walsh has published version 1.53 of the DocBook XSL Stylesheets. I'm using these for Processing XML with Java. This version fixes some bugs, refactors XSL-FO page masters, and adds some new parameters.

Sunday, July 28, 2002

The MindElectric has released ElectricXML 5.0, a closed source, free-beer, tree-based API for processing XML with Java. This release does allow developers to prevent the parser from throwing away white space. However, the default behavior is still backwards. And there are many other areas in which ElectricXML simply does not correctly model XML.

The most recent example I noticed was that ElectricXML treats the XML declaration as a processing instruction. I admit that a lot of developers think the XML declaration is a processing instruction. I've made that mistake myself in the past. However, the XML specification is quite clear that XML declaration is not a processing instruction, and an API shouldn't pretend that it is.

Here's another problem I just discovered. According to the JavaDoc for the Document class, "When parsing XML, you can supply an optional Hashtable of namespace values that acts as global context." Why would you want to do that? All namespaces used should be declared in the document itself, and if they aren't, that's an error that should be reported to the client application. The more time I spend looking at this API, the more flaws I find. I just don't think ElectricXML is suitable for real work.


Jochen Wiedmann's released JaxMe 1.4.1, "yet another open source Java/XML binding tool in the style of Castor or Zeus" that sits on top of SAX2. JaxMe provides code generators that read a W3C XML schema and generate code for parsing conformant XML documents into corresponding Java objects, saving those objects into a database or reading those Java objects from a database and converting them into XML. JaxMe supports SQL databases and Tamino. It includes an integrated application framework and a generator for EJB entity beans with BMP (bean managed persistence). It's based on a reduced subset of the W3C XML schema language that does not support choices, references, or recursion. This release fixes a few bugs allows generated sources to be compatible with EJB 1.1 and not necessarily 2.0.

Saturday, July 27, 2002

The W3C DOM Working Group has published two new working drafts:

  • Document Object Model (DOM) Level 3 Validation Specification defines interfaces that enable a program to test nodes for validity against the document's schema. It is a vastly cutback subset of what was previously done as part of abstract schemas. The working group also released a note on Abstract Schemas, solely to make it clear that work on this spec has stopped, and will not be pursued further at the current time. This is a good thing. The abstract schemas API that was being developed was far too complex and far too tied to the W3C XML Schema Language. Killing this now opens up the way for many different developers to experiment with different schema APIs. maybe in a few years we'll have enough experience with these to standardize one, but right now the community just does not know enough to do this well.

  • The Document Object Model (DOM) Level 3 Load and Save Specification describes how new DOM Document objects can be created, how XML text files can be parsed into DOM Document objects, and how DOM Document objects can be serialized back into text files. There are lots of small changes in this draft. I'll be updating the DOM chapters of Processing XML with Java in the next couple of days to incorporate them.

Friday, July 26, 2002

Jiri Tobisek's XMLtype 0.4, a GPL'd, Linux console-based editor for narrative-oriented XML documents encoded in UTF-8. It supports bi-directional text.


IBM's alphaWorks has released version 3.2 of their Web Services Toolkit. This provides the "basic software components needed to create a Web services environment are provided with Web Services Tool Kit. Included is an architectural blueprint (Web Services Architecture), sample programs, Utility services, and some tools that are helpful in developing and deploying Web services. Extensive documentation is included to assist developers with the basic concepts of Web services. The tool kit also includes a fully-functioning Web services client API that can be used to directly access a UDDI registry." New features in 3.2 include a Service Level Agreement (SLA)-based management system, a Web Services Matchmaking engine, updated WS-Security features, the Web Services Experience Language, and a new SOAP service monitor. Java 1.3 or later is required.

Thursday, July 25, 2002

Frédérik Bilhaut's Xineo XIL 0.5.0  defines an XML language for transforming various record-based data sources into XML documents, and provides a fully functional XIL processing implementation. This release supports SQL sources via JDBC and structured text sources such as comma separated values. It can be extended with new data source implementations. Xineo XIL is published under the LGPL.


Jonathan Bartlett's XML Tangle 0.6 is an open source tool that implement the "tangle" portion of Donald Knuth's Literate Programming. Traditional stylesheet utilities can be used for the weave (CSS/DSSSL/XSLT/etc). According to Bartlett, "XML has this amazing feature that is terribly underused - processing instructions. Thus, I decided to create literate programming tools that could be used by any DTD, and the information I needed for doing literate programming would be discovered through processing instructions." XML Tangle is published under the GPL.


Syncro Soft has released version 1.2.1 of <oXygen/>, a $65 payware XML editor written in Java that can run as an applet. <oXygen/> 1.2 supports XSLT and XSL-FO, among other features. Version 1.2.1 improves XPath support, uses the schema to inform the code insight, works better on the Mac, and uses a custom class loader to avoid interference from Java extensions.


IBM's alphaWorks has posted a beta of Netscape 7 for AIX. This browser is based on the Mozilla 1.0RC2 code base and supports XML, CSS, XSLT, and other fun things.

Wednesday, July 24, 2002

The Apache XML Project has released Xerces C++ 2.0.0. New features in version 2.0 include:

  • Complete support for the W3C XML Schema Language
  • The "Apache Recommended DOM C++ Binding", and
  • Experimental DOM Level 3
  • 64 bit binaries
  • Grammar preparsing and Grammar caching
  • Follow Unix Shared Library Naming Convention
  • Option not to load the external DTD subset
  • Project files for Microsoft Visual C++ .Net
  • Codewarrior 8 support
  • Option to enable/disable strict IANA encoding name checking

Naturally, this is open source under the Apache license. Binaries are available for Windows and most major *n*xes except Mac OS X. (You know, now that Mac OS X is the best selling Unix on the planet, maybe it behooves vendors, open source and otherwise, to start thinking about including it in their release plans.)


Microsoft's posted a beta of the XmlDiff and Patch tool is as a web app that can compare and produce a diff for two arbitrary XML documents. Patching and standalone use are not yet supported.

Tuesday, July 23, 2002

The Mozilla Project has posted the first beta of Mozilla 1.1 for the usual batch of platforms (Mac, Windows, Linux, Solaris, OpenVMS). New features in 1.1 are fairly minor and include full-screen mode for Linux, improved Hebrew and Arabic support, and a much enhanced JavaScript debugger. In addition some rendering bugs have been fixed. Mozilla supports XML, CSS, XSLT, MathML, HTML, XHTML, and lots of other cool acronyms. Best of all, it lets you turn off pop-up ads. It's my browser of choice on Windows and Linux, and I'm just waiting for one obscure AppleScript bug to get fixed before I make the switch to Mozilla on the Mac.


Luis Argerich's posted version 1.11 of his PHP XML Classes. This release adds a class that can check if a file or URL is well-formed.


Php.XPath 3.2, is an open source PHP class for accessing XML documents through the powerful XPath language without requiring the DOM XML extensions to be setup on your server.

Monday, July 22, 2002

The W3C/IETF joint XML Signature Working Group has released the candidate recommendation of XML-Signature XPath Filter 2.0. According to the abstract, "XML Signature [XML-DSig] recommends a standard means for specifying information content to be digitally signed and for representing the resulting digital signatures in XML. Some applications require the ability to specify a subset of a given XML document as the information content to be signed. The XML Signature specification meets this requirement with the XPath transform. However, this transform can be difficult to implement efficiently with existing technologies. This specification defines a new XML Signature transform to facilitate the development of efficient document subsetting implementations that interoperate under similar performance profiles."

Sunday, July 21, 2002

Jochen Wiedmann's released JaxMe 1.4.0, "yet another open source Java/XML binding tool in the style of Castor or Zeus" that sits on top of SAX2. JaxMe provide code generators that read a W3C XML schema and generate code for parsing conformant XML documents into corresponding Java objects, saving those objects into a database or reading those Java objects from a database and converting them into XML. JaxMe supports SQL databases and Tamino. It includes an integrated application framework and a generator for EJB entity beans with BMP (bean managed persistence). It's based on a reduced subset of the W3C XML schema language that does not support choices, references, or recursion.

Saturday, July 20, 2002

Logilabs' Python XmlTools 1.3.7 is a GPL'd pair of pyGTK widgets that can display and edit an XML document in a graphical fashion.

Friday, July 19, 2002

FileMaker Inc. has released version 6.0 of their namesake FileMaker database for Mac and Windows. The big new feature in this release is XML import and export across the product line. In essence, you can import data from any XML document by writing an XSLT stylesheet that maps that document into FileMaker's FMPXMLRESULT vocabulary. Going the other direction, you can export query results to any XML format, again by writing an XSLT stylesheet that does the conversion.I've been doing XML and HTML exports from Filemaker with FileMaker scripts, calculation fields, and AppleScript for years, but this strikes me as much cleaner and easier. The base version of FileMaker is $299 payware. Upgrades are $149. Server versions range up to $999 for unlimited clients.

Thursday, July 18, 2002

Norm Walsh has released version 4.2 of DocBook. DocBook is general purpose XML and SGML document type particularly well suited to books and papers about computer hardware and software. It's been used for the Linux Documentation Project, several O'Reilly books, and my own Processing XML with Java. Version 4.2 is upwards-compatible with 4.1.2. That is, all document that are valid against 4.1.2 are still valid against 4.2. New features in this release include:

  • EBNF markup
  • HTML Forms
  • MathML
  • SVG
  • More systemitem classes
  • methodsynopsis now includes a language attribute

Michael Fuchs has posted version 0.2.7 of his DocBook Doclet that creates DocBook SGML and XML documents from JavaDoc. This release adds the docbook.introduction.chapters property so you can write the introduction of the resulting book in DocBook itself. It also fixes a couple of bugs.


Andy Clark's posted a new release of his CyberNeko Tools for the Xerces Native Interface (NekoXNI). This is a collection of XML tools written specifically to take advantage of the XNI API in Xerces2 including the NekoHTML parser and the NekoDTD parser. This release fixes some bugs and adds new tools for validating documents against Relax NG schema with Xerces.

Wednesday, July 17, 2002

LimeWire 2.5.3 has been released. This GPL'd Gnutella client is my file sharing tool of choice these days. It's written in 100% pure Java, and runs nicely on my Linux box. On my Mac, however, it's dog slow. New features in this release include Browse Host, parallel connections on downloads, and full support for the Hash/Urn Gnutella Extension (HUGE).


Brendan Macmillan's Java Serialization for XML (JSX) 1.0.2.0 claims to allow "all objects to be written and read as XML, using Java's standard Serialization API. Objects from your present application version can be migrated to the next version, despite class evolution, by processing the XML with XSLT, SAX, DOM, or JDOM. Unlike java.beans.XMLEncoder/XMLDecoder, JAXB, and Castor, JSX works for all objects." It seems to work by chaining to an ObjectInputStream/ObjectOutputStream and converting the binary serialization to XML. It's a really neat hack. JSX is published under the GPL.


The OpenOffice Project has released OpenOffice 1.0.1, a bug fix release of the open source office suite, that saves all its files as zipped XML. I've been using OpenOffice Writer for my next book, and mostly I'm happy with it. It's certainly the most serious competitor to Word I've seen in over a decade. One of the new features is "a .pdf installation guide in the package". I actually tried making some PDF files yesterday and ended up with pure PostScript files instead so maybe this will help.

Tuesday, July 16, 2002

IBM's alphaWorks has released UDDI for Python, a "Python package that allows the sending of requests to and processing of responses from the UDDI Version 2 APIs." UDDI stands for "Universal Description, Discovery, and Integration," and if you think that's a bunch of gibberish that doesn't really mean anything, well, let's just say I won't disagree with you. Clients communicate with UDDI registries using SOAP, and thus XML. This is a somewhat higher level API that shields developers from underlying XML transport.

Monday, July 15, 2002

The Apache XML Project has released version 2.0.3 of the Cocoon application server. Apache Cocoon is an XML framework that raises the usage of XML and XSLT technologies for server applications to a new level. Designed for performance and scalability around pipelined SAX processing, Cocoon offers a flexible environment based on the separation of concerns between content, logic and style. A centralized configuration system and sophisticated caching top this all off and help you to create, deploy and maintain rock-solid XML server applications. Version 2.0.3 is a "maintenance release focusing on improved performance, robustness and better documentation." New features include:

  • The build system automatically detects the version of Java in use and builds a target for this version.
  • Proper escaping of national characters included in element's attributes in XSP page.
  • The SQLTransformer now tries to open a connection to the database several times before returning an error.
  • CocoonServlet no longer builds its own classloader by default.
  • Added a "handle-exceptions" init argument in web.xml, used by CocoonServlet for the exceptions that the core Cocoon class throws.
  • Parameterizable URLFactories.
  • Novell port
  • Input modules for Date, Digest, ConstantString, Random, NullInput, and Collection
  • A new enumerated values constraint for strings
  • Added capability to store/fetch XML to SQLTransformer.
  • Added AbstractSAXTransformer for custom transformers.

The Apache XML Project has released XML Security v1.0.4 This is an implementation of security related XML standards including Canonical XML, and XML Signature Syntax and Processing. Version 1.0.4 improves Java 1.4 support and uses the most recent version of the Bouncy Castle JCE.

Sunday, July 14, 2002

The W3C CSS Working Group has released the Media Queries Candidate Recommendation. According to the abstract,

HTML4 and CSS2 currently support media-dependent style sheets tailored for different media types. For example, a document may use sans-serif fonts when displayed on a screen and serif fonts when printed. "Screen" and "print" are two of media types that have been defined. Media Queries extend the functionality of media types by allowing more precise labeling of style sheets.

A Media Query consists of a media type and one or more expressions to limit the scope of style sheets. Among the media features that can be used in media queries are "width", "height", and "color". By using Media Queries, presentations can be tailored to a specific range of output devices without changing the content itself.

Saturday, July 13, 2002

Bertrand Delacrétaz has posted jfor 0.7.0, an open source XSL-FO to RTF converter. This release fixes a few bugs and makes the license Apache compatible.

Friday, July 12, 2002

Almost two years ago at XMLOne in San Jose, I publicly predicted that schema repositories were going to fail. Now I get to say, "I told you so." Microsoft's BizTalk is closing its doors one week from today. Several of the other sites I targeted for failure are still going, but only XML.org seems to be functioning as a real schema repository any more, and it's no longer emphasizing that task. I'm still predicting that UDDI will be a massive failure that never achieves any significant adoption. Most of the rest of my predictions still seem on-target, though I was a little early at picking 2001 as the year for SVG. 2002 is just starting to see large SVG use. 2003 seems to be the year where it will become more ubiquitous.


The W3C DOM Working Group has announced that they've halted work on DOM Abstract Schemas. This strikes me as sensible. The whole abstract schemas model they'd created was extremely unwieldy, didn't really fully handle either DTDs or schemas, and was completely unable to handle alternative languages like RELAX NG and Schematron. This is a good example of a process that really did need to be scrapped and started over from scratch, if at all, and I'm glad the DOM working group had the courage to admit their problems. This opens up the field for other researchers to try a variety of approaches to schema APIs. The "Document Editing" part of Abstract Schema will be spun off as a new "Validation" module, so it will still be possible to do programmatic validation as part of DOM. It just won't be possible to model the schemas directly.


At the publisher's request, I've added a Recommended Reading section to Processing XML with Java. Mostly this lists the various specifications for the technologies discussed in the book, as well as half a dozen books I consulted during the writing process, though most of these are my own. I find the real information in this space is generally available online long before it gets into books. Indeed, in numerous cases Processing XML with Java will be the first book I'm aware of that discusses certain bleeding edge technologies.

Thursday, July 11, 2002

The W3C XML Linking Working Group has split the XPointer spec into four last call working drafts:

The combined changes in these drafts seem to amount to the following:

  • ID-type attributes declared in schemas can now be used as the target of bare name XPointers.
  • Child sequences must be enclosed in an element() scheme. That is, to select the third child element of the second child element of the root element of the document at http://www.cafeconleche.org/, you now write http://www.cafeconleche.org/#element(/1/2/3) instead of http://www.cafeconleche.org/#/1/2/3.

Comments are due by July 31.

Wednesday, July 10, 2002

The UDDI 3.0 specification has been released. UDDI is the Universal Description, Discovery and Integration. It's an XML application that attempts to create "a platform-independent, open framework for describing services, discovering businesses, and integrating business services using the Internet, as well as an operational registry that is available today." I remain skeptical.

Meanwhile the W3C Web Services Description Working Group has published the first working draft of WSDL 1.2 and its bindings to SOAP 1.2, HTTP/1.1 GET/POST, and MIME. "WSDL is an XML format for describing network services as a set of endpoints operating on messages containing either document-oriented or procedure-oriented information." According to the press release, improvements since 1.1 include:

  • Language clarifications
  • W3C XML Schemas and XML Information Set support
  • A conceptual framework approach to define the description components, that makes them simpler and more flexible.
  • Removal of "unnecessary and non-interoperable features from WSDL 1.1"
  • Better definition for the HTTP 1.1 and SOAP 1.2 bindings

 Michel Rodriguez's XML::Twig 3.05 is a Perl module that subclasses XML::Parser to process XML through a tree-oriented interface. The tree is only built for parts of the document as needed.


Norm Walsh has released version 1.52.2 of his XSLT stylesheets for DocBook to fix some chunking problems with yesterday's 1.52.1 release. I'm using it to write Processing XML with Java.

Tuesday, July 9, 2002

Daniel Veillard's released version 1.0.19 of libxslt, the GNOME XSLT library and version 2.4.23 of libxml2, the GNOME XML parser for Linux. The new version of libxslt adds some EXSLT dynamic functions, xsl:sort order support, and fixes bugs. The new version of libxml mostly focuses on bug fixes and speeds ups.


Sean Russell's released  version 2.4.0 of REXML, an open source, non-validating XML parser for Ruby. REXML includes a tree model parser, a SAX2 streaming parser, and a pull parser. It also includes a full XPath implementation. REXML is distributed under the Ruby license.


Luis Argerich's released the PHP XML Classes 1.9, a collection of PHP classes for processing XML. Version 1.9 adds an RDQL class. "This is a generic RDQL engine and an implementation of the engine to query RDF documents from files or URLs."


XMLEditor 0.5.3 is a GPL'd XML editor for Linux. Like many other editors it has a tree-based user interface that is designed because it's easy for programmers to implement on top of standard widgets, not because it's what any author actually wants to use.

Monday, July 8, 2002

Norm Walsh has released version 1.77 of his DSSSL stylesheets for DocBook and version 1.52.1 of his XSLT stylesheets for DocBook. DocBook is an XML application for technical documents. I'm using it to write Processing XML with Java.

New features in the XSL release include: "A complete and consistent set of chunking parameters; new HTML Help parameters; support for new-style OLinks; experimental support for xref styles; completely reworked page master/sequence config; support for cross-references to paragraphs; new header/footer, column, and glossary parameters; other new parameters: draft.mode, suppress.footer.navigation and suppress.header.navigation, make.graphic.viewport, nominal.image.depth, nominal.image.width, use.embed.for.svg, refentry.title.properties, section.title.properties, use.embed.for.svg, generate.meta.abstract.xml". He's also updated the test suite. The DSSSL release mostly fixes bugs.


Walsh has also posted the first beta of DocBook slides 3.0, a customization layer for producing presentations in DocBook. This is not backwards compatible with version 2.0 of the DTD.

Sunday, July 7, 2002

The XML Apache Project has released version 0.20.4 of FOP, the popular open source XSL Formatting Objects (XSL-FO) to PDF converter. New features since 0.20.3 include:

  • Support for background-images
  • FOP should now work with any JAXP1.1 compliant parser/transformer
  • Fop has been compiled with Jimi support
  • Logging has been changed from LogKit to Avalon's Logger Interface
  • New hyphenation patterns for Turkish, Portuguese and Czech
  • FOP should now work on a EBCDIC machine
  • Support for comma-separated values for the font-family property
  • Russian and Czech messages for AWTViewer
  • The AWTViewer can reload files
  • Support for fractional font sizes
Saturday, July 6, 2002

Microsoft's posted two updates of Internet Explorer for the Macintosh, Internet Explorer 5.1.5 for Mac OS 8 and 9 and Internet Explorer 5.2.1 for Mac OS X. Both fix a number of security holes.

Friday, July 5, 2002

I've posted version 1.0d9 of my XInclude processor for Java. This version still supports DOM, JDOM, and SAX. The JDOM support has been upgraded to the current CVS version of JDOM. It may work with JDOM beta 8. It probably won't work with earlier versions. The DOM version works again with Xerces 2.0.2. Earlier versions of Xerces mostly have nasty bugs that prevent it from working. I haven't done any extensive testing with other DOM implementations. For my own use, I mostly stick to the SAX version, which actually works quite well provided you don't need XPointer support.

There's one breaking API change. I renamed the package from com.macfaq.xml to com.elharo.xml.xinclude. (I plan to release some other things in the com.elharo.xml package in September.) I've also cleaned up the distribution quite a bit in this release. Some necessary JAR files are now bundled, and I've added a build file for Ant. Various bugs were fixed. The xi:fallback element is not supported yet. That's on my TODO list. I also plan to revise the inner workings of the DOMXIncluder class pretty radically based on techniques I learned while working on the DOM chapters of Processing XML with Java.

Thursday, July 4, 2002

The W3C XML Protocol Working Group has published a note on SOAP Version 1.2 Email Binding. "The motivation for this document is to illustrate the SOAP 1.2 Protocol Binding Framework and the creation of an alternative protocol binding specification to the Default HTTP binding. This second binding is meant to validate the Protocol Binding Framework for completeness and usability. Please note that this document is a non-normative description of an Email Binding."

Wednesday, July 3, 2002

I've updated XSL Formatting Objects, Chapter 18 of the XML Bible, to be fully conformant with the final recommendation of XSL-FO 1.0. Mostly the changes were fairly minor, just changing the master-name attribute to the master-reference attribute on about four elements. I also updated the FOP coverage to version 0.20.4, and fixed a few minor errors where I noticed them.

For this release, I pulled my original Word file into OpenOffice where I edited it and then saved it to HTML. The conversion from Word to OpenOffice was straightforward. The conversion from OpenOffice to HTML was not. Although OpenOffice's HTML is a lot cleaner than Word's, it still has a number of flaws. Most importantly, it does not properly handle lists or preformatted sections. It tries to represent them using CSS and FONT tags rather than basic UL and PRE elements. In general, there were lots of unnecessary CSS attributes and style tags. For instance, virtually every paragraph had its font color set to black. Many paragraphs had a lang attribute identifying the language as the empty string. I had to do a lot of manual cleanup to make the final result look good. This would have been easier if I could have used XSLT, but OpenOffice saves HTML 3.2, not XHTML or even well-formed HTML.

Still, this was no worse than doing the task in Word, which has similar issues. The one area where Word was notably superior to OpenOffice for this job was a very specific one involving search and replace. OpenOffice does not let you replace font qualities like "Bold" or "Times New Roman" with styles like "emphasis" or "BodyText". You can replace styles with styles and fonts with fonts, but not fonts with styles.

Tuesday, July 2, 2002

The fifth beta of Luxor, a GPL'd XML User Interface Language (XUL) toolkit for Java, has been posted. Luxor includes a web server, a portal engine that supports RSS, the Velocity template engine, a Python interpreter, and more. Beta 5 adds:

  • Apollo - Test Skeleton for Web Start/JNLP
  • Caramel - Java Extensions (non-GUI only)
  • Houston - Yet Another Status and Logging Toolkit
  • Rachel - Resource Loading
  • Salsa - Swing GUI Add-Ons
  • A Python interpreter
  • More docs: framed XUL tag reference, framed JNLP tag reference, etc.)
  • More examples
  • Various bug fixes
Monday, July 1, 2002

Opera Software has updated its namesake Opera web browser for Windows to version 6.0.4. This release:

  • Optimizes memory
  • Improves the display when playing Windows Media Player files
  • Fixes printing of headers and footers

Opera supports direct display of XML with attached CSS stylesheets.


Sean Russell's released  version 2.3.7 of REXML, an open source XML parser for Ruby. This release fixes some bugs, including better support for the document type declaration. REXML is distributed under the Ruby license.


Michael Fuchs has posted version 0.2.6 of his DocBook Doclet that creates DocBook SGML and XML documents from JavaDoc. This release fixes a couple of bugs.

Sunday, June 30, 2002

Sun's posted a maintenance release of Java Specification Request 67, Java APIs for XML Messaging. This has now been split into JAXM 1.1 and SAAJ 1.1. JAXM implements SOAP 1.1. SAAJ covers is an API for SOAP with Attachments that can be used by other specs like JAX-RPC without depending on the rest of JAXM.

Saturday, June 29, 2002

eCube's released version 1.2 of their catchXSL XSLT profiler. Version 1.2 adds a Swing-based GUI.


Version 0.3.0 of the open source FOA (Formatting Object Authoring tool) has been released. FOA is a Java application "that gives users a graphical interface to author XSL-FO stylesheets. With FOA you can generate pages, page sequences and fill them with content provided into one or more XML files. FOA will generate the XSLT stylesheet that transforms the XML content into an XSL-FO document." New features in 0.3.0 include a full FO table implementation, page numbers, and multiple headers and footers for each page sequence.

Friday, June 28, 2002

The W3C XML Protocol Working Group has updated six working drafts:

Last call for these ends July 19.

Thursday, June 27, 2002

Syncro Soft has released <oXygen/> 1.2, a $65 payware XML editor written in Java that can run as an applet. <oXygen/> 1.2 supports XSLT and XSL-FO, among other features.

Wednesday, June 26, 2002

Excelon Corporation has released Stylus Studio 4.0, a $399 payware XML IDE that supports XSLT, W3C XML Schemas, and DTDs. Notable features include the ability to debug XSLT stylesheets.


The Jakarta Apache project has released JXPath 1.0, an open source XPath interpreter for Java. What is unique is that JXPath can apply XPath expressions to graphs of objects of non-XML types such as JavaBeans, Collections, arrays, Maps, Servlet contexts, and combinations thereof.


Version 0.95 of Sablotron, an open source XML processor for C++ has been released. Version 0.95 supports XSLT 1.0, XPath 1.0, DOM Level 2, and some extension functions from EXSLT.

Tuesday, June 25, 2002

Kohsuke Kawaguchi's released RelaxNGCC 1.0, a compiler compiler data binding tool for XML based on the RELAX NG schema language. RelaxNGCC is published under the GPL. RelaxNGCC reads a RELAX NG schema and generates a matching SAX ContentHandler that produces Java objects.


Jochen Loewer's tDOM 0.7.1  is an XML/DOM/XPath implementation for Tcl written in C.

Monday, June 24, 2002

Version 1.6 of the Axkit, the Perl-based XML Application Server Framework for Apache, has been released. AxKit converts XML to other formats such as HTML, WAP and text on the fly using either W3C standard techniques like XSLT and XInclude or custom code. New features in this release include:

  • Content and Style Providers are separated so the stylesheets don't have to pass through the same module the XML goes through.
  • A SAXMachines language module
  • An axkit URI scheme
  • A new AxTraceIntermediate debug option makes AxKit save to a file at every intermediate stage of processing
  • Many bug fixes
Saturday, June 22, 2002

The XML Apache Project has released version 2.0.2 of Xerces-J, the popular open source XML parser for Java. Xerces supports schemas, SAX2, DOM2, and XNI (Xerces Native Interface). Version 2.0.2 improves PSVI and DOM Level 3 support. It also fixes numerous bugs.


The IETF/W3C XML Signature Working Group has posted the last call working draft of the XML-Signature XPath Filter 2.0. According to the abstract, "XML Signature [XML-DSig] recommends a standard means for specifying information content to be digitally signed and for representing the resulting digital signatures in XML. Some applications require the ability to specify a subset of a given XML document as the information content to be signed. The XML Signature specification meets this requirement with the XPath transform. However, this transform can be difficult to implement efficiently with existing technologies. This specification defines a new XML Signature transform to facilitate the development of efficient document subsetting technologies that interoperate under similar performance profiles." According to the Intro, "The goal is to (1) more easily specify XPath transforms and (2) more efficiently process those transforms." Comments are due by July 11.

Friday, June 21, 2002

Matt Fausey's released Chilkat XML, a closed source, free-beer, non-validating XML parser for Windows COM that supports DOM.


Eric van der Vlist has introduced the XML Validation Interoperability Framework (xvif), "a proposal for embedding pipes of transformations and validations within grammar based schema languages." In contrast with document based piping approaches, this focuses on "micro-pipes" that operate on individual information items such as attributes, text nodes, elements and so forth. The prototype is based on Relax NG and Python.


In a similar initiative, Rick Jelliffe's proposed XML: Schemachine (Link is a PDF). Schemachine splits a document into parts by various criteria and applying different validation languages to each part. It differs from XVIF in that it works from outside the schema language rather than inside the schema language.


Design Science has posted the fourth beta of MathPlayer, a MathML plug-in for Internet Explorer. The big new feature in this beta is a right-click menu that allows you to copy MathML equations from web pages.


Version 2.21 of SVG.pm, a Perl library for working with Scalable Vector Graphics via DOM, has been posted to CPAN.

Thursday, June 20, 2002
Cover of XML in a Nutshell, 2nd edition

I'm pleased to announce that the second edition of XML in a Nutshell has been released and can now be found at fine computer bookstores everywhere including Amazon.com. Powells, Buy.com, and Barnes & Noble show it on back order, but presumably they'll get stocks soon. Amazon almost always sells out of their initial runs of my books within the first few hours of me announcing it here, but they generally get stock back in very quickly. The list price is $39.95, but Amazon has its usual 30% discount.

For those of you who've read and enjoyed the first edition, the obvious question is what's new? and is it worth shelling out another $30 to update? The latter question may depend on your pocketbook and just how dog-eared your copy of the first edition is, but I can tell you what's new.

The single biggest addition since the first edition is complete coverage of W3C XML Schemas. Scott wrote a tutorial chapter and I wrote a reference chapter that together cover every last element and attribute in the language. Jeni Tennison did an excellent tech review of those chapters, and taught us many things about the schema spec that were far from obvious. I think the result is the most complete and accurate reference to the W3C XML Schema Language you'll find anywhere. I also wrote a new chapter about RDDL, the Resource Directory Description Language, an XML application based on modular XHTML that can be used for XML documents placed at the end of namespace URIs.

The remainder of the book is very similar to the first edition in overall structure and layout. However, we did rewrite every chapter to bring it up to date with the state of the art of XML in 2002. In the process, we added a lot of new and updated content to the existing chapters including:

  • SAX filters
  • JAXP, the Java API for XML Processing
  • TrAX, the Transformations API for XML
  • Unicode 3.1
  • XLink 1.0
  • XPointer, 2nd candidate recommendation
  • XSL Formatting Objects 1.0

Of course, we also corrected any mistakes we found along the way, or that had been pointed out to us by readers of the first edition. This includes the infamous bug on p. 35 that replaced ?, *, and + with *, *, and *. (FYI, I have no idea how that slipped in originally. Scott went through the drafts of the first edition and verified that it was correct right up through the final page proofs. Somewhere between the page proofs and the printer, the ? and + got changed asterisks.) Overall, though, I think the first edition was a pretty good book and this one's even better. If you're working with schemas, then I think you'll want a copy of the second edition immediately. If not, you may be able to hold out until the pages start falling out of the first edition (which may take awhile, O'Reilly uses pretty durable binding.) The second edition of XML in a Nutshell is $39.95 and is now or soon will be available at bookstores everywhere.

Wednesday, June 19, 2002

The XML Apache Project has posted the third beta of Batik 1.5, an open source SVG display engine based on Java 2D. Beta 3 fixes many bugs and improves script support security control. In addition, the SVGBrowser has been renamed "Squiggle".

Tuesday, June 18, 2002

Andy Clark's posted the CyberNeko Tools for Xerces Native Interface (NekoXNI). This is a collection of XML tools written specifically to take advantage of the XNI API in Xerces2 including the NekoHTML parser and the NekoDTD parser. For the first time, this release also includes the CyberNeko Style Processor (version 0.1), an XML batch processing framework.


Victor Minghir's DBxml 0.0.2 is a simple MySQL client for Linux that outputs XML.


XMLUnit 0.6 is an extension to the popular JUnit testing framework that allows assertions to be made about the equality of whole XML Documents, XPath result trees, and XPath expressions. It requires a JAXP / Trax compliant parser.


Topologi has posted a new version of the Schematron Validator for Windows (registration required). Schema languages supported include

  • Schematron
  • DTDs
  • RELAX NG (including embedded Schematron schemas)
  • W3C XML Schemas (including embedded Schematron schemas)

It can also be used as a shell for XSLT processing chains.


Microsoft version 5.2 of Internet Explorer for Mac OS X, a web browser that supports direct display of XML documents with attached CSS style sheets. New features include better handling of fonts in Mac OS X 10.1.5.

Monday, June 17, 2002

The XML Apache Project has posted a release candidate of FOP 0.20.4, the popular open source XSL Formatting Objects (XSL-FO) to PDF converter. New features since 0.20.3 include:

  • Support for background-image
  • FOP should now work with any JAXP1.1 compliant parser/transformer
  • Fop has been compiled with Jimi support
  • Logging has been changed from LogKit to Avalon's Logger Interface
  • New hyphenation patterns for Turkish, Portuguese and Czech
  • FOP should now work on a EBCDIC machine
  • Support for comma-separated values for the font-family property
  • Russian and Czech messages for AWTViewer

Norm Walsh has published version 2.0.6 of DocBook: The Definitive Guide. This open source book describes the upcoming DocBook 2.4 release. He's also posted a version that describes just Simplified DocBook.

Walsh has also released version 1.51.1 of the DocBook XSL Stylesheets. New features include:

  • An extension function to determine the intrinsic size of an image.
  • Callout bullets 11-15
  • New configurable parameters including points.per.em, generate.manifest, manifest, compact.list.item.spacing, html.extra.head.links, and use.svg
  • Support xref on any element that has a title
  • MathML can be passed through unchanged
  • Reworked support for graphic attributes in HTML
  • Support the shade.verbatim parameter in XSL-FO
  • Support compact list spacing in XSL-FO
Sunday, June 16, 2002

The Apache XML Project has posted the first developer's release of Xalan-J 2.4, an open source XSLT processor for Java. The major change in this release is that Xalan now works with Xerces 2.x. The big new feature is support for the EXSLT extension library. And of course numerous bugs are fixed, and probably a few new ones have been introduced.

Saturday, June 15, 2002

Luis Argerich's released the PHP XML Classes 1.5, a collection of PHP classes for processing XML.


Andy Clark's posted the CyberNeko Tools for Xerces Native Interface (NekoXNI). This is a collection of XML tools written specifically to take advantage of the XNI API in Xerces2. This release includes an updated version of the NekoHTML parser and the initial release of NekoDTD. NekoDTD is a DTD that uses the XNI framework and the Xerces2 DTD scanner implementation to convert DTDs into XML instance document syntax.

Friday, June 14, 2002

Yuval Oren's released Piccolo 1.0.3, a very fast, open source, non-validating SAX2 parser. I've been a Xerces partisan for a while, but Piccolo's made me rethink that. If you need a fast, conformant, pure SAX parser for Java without all the overhead of monster parsers like Xerces and Oracle, then Piccolo may be right for you. Version 1.0.3 fixes a few minor bugs.


Yann Dirson's released sgml2x 1.0.0 , a DSSSL formatter for XML and SGML based on jade. This release fixes some bugs and configuration problems.

Thursday, June 13, 2002

Jens Låås's xmlclitools 1.25 are four Linux command-line tools for searching, modifying, and formating XML data. The tools are designed to work in conjunction with standard utilities such as grep, sort, and shell scripts. They are published under the LGPL.


Jochen Wiedmann's released JaxMe 1.2.4, "yet another open source Java/XML binding tool in the style of Castor or Zeus" that sits on top of SAX2. It's based on a reduced subset of the W3C XML schema language that does not support choices, references, or recursion. It does support JDBC mapping to an SQL table and reading from joined tables. This release vastly improves support for Ant.

Wednesday, June 12, 2002

Before I'd even finished upgrading all my machines to Mozilla 1.0, the Mozilla Project has posted the first alpha of Mozilla 1.1 open source web browser for the usual batch of platforms: Mac OS 9.1, Mac OS X, Linux, Windows, etc. New features include the ability to view HTML e-mail as plain text, Quartz rendering for Mac OS X 10.1.5 users, new layout performance enhancements targeted at DHTML, faster startup times, View Source for MathML and selections, XBM image support, better drag and drop, better image blocking for Mail & News, and more.


Sun's posted the Java XML Pack Summer 02 Release, a bundle of various XML related technologies including the next version of the Java API for XML Processing, JAXP 1.2. This release includes:

  • Java API for XML Processing (JAXP) 1.2
  • Java API for XML Messaging (JAXM) 1.1
  • Java API for XML Registries (JAXR) 1.0_01
  • Java API for XML-based RPC (JAX-RPC) 1.0
  • SOAP with Attachments API for Java (SAAJ) 1.1
  • JavaServer Pages Standard Tag Library (JSTL) 1.0
  • Xalan-J 2.3.1
  • XSLTC 2.3.1
  • Xerces-J 2.0.1

The big new feature in JAXP 1.2 is W3C XML Schema Language support. This release also makes the move from Crimson to Xerces-2 as the default implementation.


Sun has also released the Java Web Services Developer Pack (Java WSDP). This pack bundles together all of the above plus several more APIs including:

  • JavaServer Pages Standard Tag Library (JSTL) 1.0
  • Java WSDP Registry Server 1.0_01
  • Web Application Deployment Tool
  • Ant 1.4.1
  • Tomcat 4.1.2
Tuesday, June 11, 2002

The W3C DOM Working Group has published the candidate recommendation draft of Document Object Model (DOM) Level 2 HTML Specification. According to the abstract, this "specification defines the Document Object Model Level 2 HTML, a platform- and language-neutral interface that allows programs and scripts to dynamically access and update the content and structure of [HTML 4.01] and [XHTML 1.0] documents. The Document Object Model Level 2 HTML builds on the Document Object Model Level 2 Core [DOM Level 2 Core] and is not backward compatible with DOM Level 1 HTML [DOM Level 1]." Comments are due by July 1.


The W3C Web Services Description Working Group has published its initial public working draft of Web Service Description Usage Scenarios, which describes "the Usage Scenarios guiding the development of the Web Service Description specification." They warm that this does not necessarily represent consensus within the Working Group and it "may change substantially due to coordination and consolidation efforts with Web Services Usage Scenarios work undertaken in the Web Services Architecture Working Group."


Stefan Champailler's DTDDoc 0.0.3 is a JavaDoc like tool for creating HTML documentation of document type definitions from embedded DTD comments. This release adds support for "multiple depths of directories" in the DTD. DTDDoc is published under the GPL.

Monday, June 10, 2002

The Jakarta Commons Project has posted the second beta of JXPath 1.0, an XPath interpreter that can apply XPath expressions to graphs of objects of various kinds: JavaBeans, Collections, arrays, Maps, Servlet contexts, DOM, etc., including mixtures thereof. It is extensible, allowing the developer to customize support for existing object models and introduce support for new ones. Beta 2 fixes numerous bugs and updates the documentation.


Sean Russell's released  version 2.3.5 of REXML, an open source XML parser for Ruby. This release fixes some bugs, including better support for the document type declaration. REXML is distributed under the Ruby license.


IBM's alphaWorks has updated their P3P Policy Editor to support Java 1.4 and fix a few bugs. This is a GUI tool for creating a Web site privacy policies that can be interpreted by Web browsers and other user agents that support the Platform for Privacy Preferences (P3P).

Sunday, June 9, 2002

Version 1.2.5 of the open source Galeon web browser for Linux been released Galeon is based on the Mozilla/Gecko rendering engine so it includes XML support. However, it is just a browser, no e-mail, no chat, no news. I've been using an earlier version on my Linux box (where it was installed by default) and so far it's pretty nice.


Dave Beckett's posted version 0.9.11 of the open source  Redland RDF library for C, Perl, Ruby, Python, Java, and Tcl. Redland provides high-level APIs for the Resource Description Framework (RDF), allowing it to be stored, parsed, queried, and manipulated. Redland has an object-based, modular design.

Saturday, June 8, 2002

IBM's alphaWorks has released the Java Record Object Model (JROM), "a tool that provides an in-memory tree representation of instances of structured, typed information and that is based on the XML Schema data type system. JROM values either are typed, simple values for first-level data, or they are complex values that can contain an arbitrary number of elements and attributes."

Friday, June 7, 2002

I've done a major rewrite to the XPath chapter of Processing XML with Java to try and make it more practical and less theoretical. If you had trouble with it before, I'd appreciate your checking it out again and letting me know if you think this version is an improvement.


Bertrand Delacretaz has posted jfor 0.6.0, an open source XSL-FO to RTF converter. The FOP integration is slowly progressing - design discussions have been going on, and FOP internals are being modified to allow easier integration of other formats besides PDF. There is currently no set date for when actual RTF/FOP code will be available to play with. This release improves table support and fixes various bugs.


Adrian Mouat's diffxml 0.9 Alpha provides diff and patch utilities "which operate on the hierarchical structure of XML documents." diffxml is published under the GPL and written in Java. It uses the XMLPULL API to read the XML document.


Dan Allen's released the  XML_XPath PEAR Class 1.1, a GPL'd PHP class that "Allows for easy manipulation, maneuvering and querying of a domxml tree using both xpath queries and DOM walk functions. It uses an internal pointer for all methods on which the action is performed. Results from an xpath query are returned as an XPath_Result object, which contains all the DOM functions from the main object. This class tries to hold as close as possible to the DOM Recommendation."

Thursday, June 6, 2002

Mozilla 1.0 has been released for Mac OS, Windows, OpenVMS, Solaris, BSD/OS, FreeBSD, OS/2, Tru64 Unix, and Linux. The open source Mozilla web browser has the best support for XML of any browser on the market today. Standards it supports include HTML 4.0, XML 1.0, the Resource Description Framework (RDF), Cascading Style Sheets Level 1 (CSS1) and Level 2 (CSS2), the Document Object Model Level 1 (DOM1) and Level 2 (DOM2), and XHTML. The entire user interface is written in XUL, the XML User Interface language. Java is supported via Sun's Java plug-in. On top of that, it lets you set the search function to Google and turn off pop-up adds. Mozilla has been my primary browser for the last year or so on Windows and Linux. Unfortunately, on the Mac it still has at least one show stopping bug in AppleScript that keeps me from using it regularly. This won't affect most users, but it's absolutely essential to my workflow for this site. :-(


In related news, Beonex has posted Beonex Communicator 0.8, a version of Mozilla 1.0 "polished for end users." It includes Navigator, Mailnews, Composer, and ChatZilla. I'm not quite sure how this differs from Mozilla itself, which is the friendliest browser I've ever used, but it might be worth checking out.


The Gnome Project has released Gnumeric 1.0.7, an open source Excel compatible spreadsheet for Linux that saves its files in XML. This is a bug fix release.

Wednesday, June 5, 2002

Over the last couple of weeks I've done the initial author review and made a number of changes to Processing XML with Java. All the chapters have been rewritten. Some of the most significant changes include:

  • I moved the SOAP schemas out of Chapter 2, XML Protocols into Appendix B. I also updated them to the official schemas for SOAP 1.1.

  • In Chapter 4, Converting XML, I added some material on the necessity of streaming solutions for large documents. I added some diagrams and text designed to clarify exactly how the conversion example operates. I also updated the XQuery examples to the latest working draft.

  • Chapter 5, Reading XML, probably changed the most. I reorganized this chapter significantly with much earlier, high-level discussion of the various APIs. I added the new XMLPULL API, corrected some bugs in the SAX example, and updated the ElectricXML section to ElectricXML 4.0.

  • I added a sidebar on Measuring DOM Size to Chapter 9, DOM. I'm still trying to improve these numbers to the point where they're reproducible and I actually believe them, though.

  • I updated Chapter 13, DOM Output, to cover the latest working draft of DOM 3 Load and Save.

  • I added a new section on Java integration to Chapter 14, JDOM, since this is one of the major benefits of JDOM.

  • In Chapter 16, XPath, I updated the Jaxen section to the final, release version of that API.

  • I combined the JDOM and JAXP quick reference appendixes into one larger XML APIs appendix. I also added XMLPULL to that appendix.

In addition, there were many minor edits, corrections, and clarifications throughout. Thanks are due to everyone who sent in comments and suggestions, especially the technical reviewers Mike Champion, Robert W. Husted, Anne T. Manes, Ron Weber, and John Wegis.

Tuesday, June 4, 2002

The Apache Project has released Apache SOAP 2.3, an open-source implementation of the SOAP 1.1 and SOAP Messages with Attachments in Java.


Yann Dirson's posted version 0.99.6 of sgml2x, a DSSSL formatter for XML and SGML based on jade. This release fixes some bugs and adds a dssslproc config file, declaration of stylesheet inheritance, and more modular style definitions.


Lucid'i.t. has posted the third alpha of the Lucid XML Toolkit 1.0. It includes a validating SAX parser and a DOM Level 2 implementation. It partially implements the W3C XML Schema Language. Java 1.1 or later is required.


SyncRo Soft Ltd has released version 1.1.9 of <oXygen/>, a $35 payware XML editor written in Java.

Monday, June 3, 2002

Sascha Leib posted a beta of rXML, an open source XML parser for REALBasic. rXML is published under the Lesser General Public License (LGPL).


Michael Fuchs has posted version 0.2.4 of his DocBook Doclet that creates DocBook SGML and XML documents from JavaDoc. This release can run as a standalone application.

Thursday, May 30, 2002

I'm travelling this weekend so updates may be a little slow until Monday.


Stefan Champailler's DTDDoc 0.0.1 is a JavaDoc like tool for creating HTML documentation of document type definitions from embedded DTD comments. DTDDoc is published under the GPL.


Altova's released version 4.4 of XML Spy Suite, their popular payware $399 payware XML editor. New features include DocBook editing and a built-in multi-language spell-checker for English (British, US, Canadian), German, Italian, Portuguese, Spanish, French, Dutch, Swedish, and other languages. In addition, separate English medical and legal dictionaries are included. UPgrades from previous 4.x versions are free.


RO IT systems GmbH has released the Perl SVG module 2.1. New features in version 2.1 include: support for Active State's Perl Package Manager. In addition a number of bugs have been fixed, and lots of pieces have been speeded up.


Rogue Wave's posted an early access release of Ratchet, "a means to map XML documents into a C++ representation". Ratchet "generates classes that represent definitions that comply with the W3C XML Schema Recommendation. The schema compiler generates source code, makefiles and HTML class reference documentation for the C++ language. The generated classes comprise a object model that conceptually represents the XML Schema definitions. The object model consists of a marshalling framework that allows serialization of objects to and from XML instances." Of course, this tool shares the common fallacies of all such tools; that is,

  • All documents of interest have schemas.
  • All documents of interest that do have schemas are valid according to their schemas.

Neither assumption is true except in very limited circumstances. Ratchet will likely be payware when finally released.

Wednesday, May 29, 2002

Aleksye's XML Security Library 0.0.6 is an open source C library for Windows and Linux based on LibXML2 and OpenSSL that supports XML Signature and XML Encryption.


Daniel Veillard's released version 1.0.18 of libxslt, the GNOME XSLT library and version 2.4.22 of libxml2, the GNOME XML parser for Linux. This is a bug fix release.


The W3C Quality Assurance (QA) Activity has published initial working drafts of four specifications on quality assurance:

These describe "a common framework for enhancing the quality practices of the W3C Working Groups in the areas of specification editing, production of test materials, and coordination efforts with internal and external groups."


The IETF/W3C XML Signature Working Group has published the proposed recommendation of Exclusive XML Canonicalization Version 1.0. Quoting from the abstract,

Canonical XML [XML-C14N] specifies a standard serialization of XML that, when applied to a subdocument, includes the subdocument's ancestor context including all of the namespace declarations and attributes in the "xml:" namespace. However, some applications require a method which, to the extent practical, excludes ancestor context from a canonicalized subdocument. For example, one might require a digital signature over an XML payload (subdocument) in an XML message that will not break when that subdocument is removed from its original message and/or inserted into a different context. This requirement is satisfied by Exclusive XML Canonicalization.

Opera Software has released version 6.0.3 of their namesake Opera web browser for Windows and version 6.0.1 for Linux. Opera supports direct display of XML with CSS stylesheets. XSLT is not supported. This release fixes a few bugs including one security hole so all users should upgrade. Opera is $39 payware or free-beer adware.


Andy Clark's updated his NekoHTML open source HTML parser to version 0.6.3. According to Clark,

NekoHTML is a simple HTML scanner and tag balancer that enables application programmers to parse HTML documents and access the information using standard XML interfaces. The parser can scan HTML files and "fix up" many common mistakes that human (and computer) authors make in writing HTML documents. NekoHTML adds missing parent elements; automatically closes elements with optional end tags; and can handle mismatched inline element tags.

NekoHTML is written using the Xerces Native Interface (XNI) that is the foundation of the Xerces2 implementation. This enables you to use the NekoHTML parser with existing XNI tools without modification or rewriting code.

This release fixes some more bugs.

Tuesday, May 28, 2002

Oracle's released release 2 of version 9i of their namesake database. This release add lots of support for integrating XML with relational data including a native XMLType to go along with INTEGER, REAL, BLOB and the other SQL types. The SQLX extensions to SQL are supported. There's in-database support for W3C XML Schemas and XPath 1.0, XSLT 1.0, and DOM operations on XMLType fields via SQL, PL/SQL and JDBC. Oracle is rather expensive payware for Windows NT/2000/XP and various Unixes. The exact cost depends on how much your local Oracle salesperson thinks you can afford.

Monday, May 27, 2002

Oracle's posted an XQuery prototype that can query local and remote XML documents through a command line interface or a Java API.


Danny Vint has posted two Quick Reference Cards for XML Schemas in PDF, formatted for an 11 by 17 tabloid sized printer. I propose a design goal for all future W3C specifications: It should be possible to fit a complete reference for the specification onto no more than two 8 1/2 by 11 inch pages, with at least one half inch margins on each side, in a font a senior citizen can read.


Andy Clark's updated his NekoHTML open source HTML parser to version 0.6.2. According to Clark,

NekoHTML is a simple HTML scanner and tag balancer that enables application programmers to parse HTML documents and access the information using standard XML interfaces. The parser can scan HTML files and "fix up" many common mistakes that human (and computer) authors make in writing HTML documents. NekoHTML adds missing parent elements; automatically closes elements with optional end tags; and can handle mismatched inline element tags.

NekoHTML is written using the Xerces Native Interface (XNI) that is the foundation of the Xerces2 implementation. This enables you to use the NekoHTML parser with existing XNI tools without modification or rewriting code.

This release fixes a nasty bug introduced in 0.6.1.

Sunday, May 26, 2002

The draft international standard of the RELAX NG schema language has been published in PDF and XML formats.


David Merrill's wt2db converts WikiText source files into DocBook XML. WikiText is a common format in WikiWikiWebs

David Merrill has also posted texi2db 0.4.2, a Perl script that converts a Texinfo file into DocBook XML. Both are wt2db published under the GPL.


Galeon 1.2.3 for Linux/Gnome has been released. Galeon is a stripped down web browser based on Mozilla's Gecko rendering engine. Unlike Mozilla, this is a browser only, not an e-mail client/IRC tool/newsgroup reader/food processor. I've used it a little on my new Linux box, and have been favorably impressed. 1.2.3 is based on Mozilla 1.0RC33.

Saturday, May 25, 2002

TM4J 0.6.4 is an open source topic map processing toolkit for Java as well as a set of topic map processing tools. Topic maps are an ISO standard for the interchange of information structures which can be used to represent ontologies, business data and processes, individual knowledge and opinions, and more. This engine processes files conforming to the XML Topic Maps (XTM) specification and stores them either in memory or in a persistent store, providing access via a Java API. This is a bug fix release.

Friday, May 24, 2002

As Murphy's Law requires, the day after I finally got around to upgrading to Mozilla RC2 on my Windows box, the Mozilla Project posted Mozilla RC3. There are no new features in this release, but lots of bugs have been fixed. This has been my default browser on Windows for almost a year now, but the Macintosh version still has at least one show stopping bug in AppleScript that keeps me from using it regularly.

Also, Hans-Joachim Matheus wrote in to report that the Netscape 7 prerelease I mentioned yesterday appears to be based on Mozilla 1.0RC2. He cites this article from Netscape Germany.

In other browser news, Galeon 1.2.2 for Linux/Gnome has been released. Galeon is a stripped down web browser based on mozilla's Gecko rendering engine. Unlike Mozilla, this is a browser only, not an e-mail client/IRC tool/newsgroup reader/food processor. I've used it a little on my new Linux box, and have been favorably impressed. 1.2.2 is based on Mozilla 1.0RC2.


Ed Avis's XMLTV 0.5 is a set of programs to process television listings into an XML-based format. There are backends to download TV listings for Canada, the USA, Britain, Austria, and Germany. It also includes some filter tools to sort, grep, print, and munge listings, and two end-user programs to plan a week's TV viewing.


Ron bourret's posted the third alpha of XML-DBMS 2.0, a set of Java packages for transferring data between an XML document and a relational database using an object-relational mapping. It includes a flexible, XML-based mapping language for describing the mappings. New features in version 2.0 include heterogenous joins, a filter language, updates and deletes (including insert-or-update semantics), support for database-generated keys, connection and statement pooling, custom formatting, and limited transformations in the mapping language. The major new functionality alpha 3 is a map generation tool that can start with an XML-DBMS map, a DTD, or a database schema and generate an XML-DBMS map, a DTD, or a set of CREATE TABLE statements.


Randy J. Ray's RPC::XML 0.41 is a Perl class library for implementing XML-RPC services both from the client side and the server side.

Thursday, May 23, 2002

Netscape's posted the first preview release of Netscape 7.0 for Windows, Mac, and Linux. Netscape 7.0 supports direct display of XML in the browser with either CSS or XSLT stylesheets. The major new features for this release (at least from an XML perspective) are P3P and MathML presentation markup support. For MathML, you'll need to download some extra fonts. This is based on the Mozilla code base, though I haven't yet figured out which release.


Luis Argerich has released PHP XML Classes 1.2. Version 1.2 adds a new XQuery Lite 1.0 class. XQuery Lite is a subset of XQuery 1.0. Other classes in this package support Xindice, Schematron, XSLT, and SAX filters.


Jez Higgins has posted a new release of  SAX in C++, a set of SAX2 bindings for C++ that includes SAX2 wrappers for expat, libxml, Xerces, and MSXML.


Andy Clark's updated his NekoHTML open source HTML parser to version 0.6.1. According to Clark,

NekoHTML is a simple HTML scanner and tag balancer that enables application programmers to parse HTML documents and access the information using standard XML interfaces. The parser can scan HTML files and "fix up" many common mistakes that human (and computer) authors make in writing HTML documents. NekoHTML adds missing parent elements; automatically closes elements with optional end tags; and can handle mismatched inline element tags.

NekoHTML is written using the Xerces Native Interface (XNI) that is the foundation of the Xerces2 implementation. This enables you to use the NekoHTML parser with existing XNI tools without modification or rewriting code.

The major new feature in this release is that content after the closing </html> tag is ignored.

Wednesday, May 22, 2002

The second candidate release of DocBook 4.2, an XML application for technical documentation such as Processing XML with Java, has been posted. This release includes a customization layer that can add Scalable Vector Graphics (SVG) support to DocBook.


Opera Software has released version 6.0 of their namesake Opera web browser for Linux. Opera supports direct display of XML with CSS stylesheets. XSLT is not supported. This release mostly speeds things up and fixes a few bugs. However it does add one major new feature, an SMS (Short Messaging Service) panel that lets users send text messages to mobile phones directly from the browser. Currently this only works in Norway, but it will expand to other countries in the future. Opera is $39 payware or free-beer adware.

Tuesday, May 21, 2002

I thought today I'd post a short fable that I wrote for the Preface of Processing XML with Java:

One night five developers, all of whom wore very thick glasses and had recently been hired by Elephants Inc., the world’s largest online purveyor of elephants and elephant supplies, were familiarizing themselves with the company’s order processing system when they stumbled into a directory full of XML documents on the main server. “What’s this?”, the team leader asked excitedly. None of them had ever heard of XML before so they decided to split up the files between them, and try to figure out just what this strange and wondrous new technology actually was.

The first developer, who specialized in optimizing Oracle databases, printed out a stack of FMPXMLRESULT documents generated by the FileMaker database where all the orders were stored, and began poring over them. "So this is XML! Why, it’s nothing novel. As anyone can see who’s able, an XML document is nothing but a table!"

“What do you mean, a table?” replied the second programmer, well versed in object oriented theory and occupied with a collection of XMI documents representing UML diagrams for the system. “Even a Visual Basic programmer could see that XML documents aren’t tables. Tables can’t contain duplicates! These are more like objects and classes. Indeed, that’s it exactly. An XML document is an object and a DTD is a class.”

“Objects? A strange kind of object, indeed!” said the third developer, a web designer of some renown, who had loaded the XHTML user documentation for the order processing system into Mozilla. “I don’t see any types at all. If you think this is an object, I don’t want to install your software. But with all those stylesheets there, it should be clear to anyone not sedated, that XML is just HTML updated!”

“HTML? You must be joking” said the fourth, a computer science professor on sabbatical from MIT, who was engrossed in an XSLT stylesheet that validated all the other documents against a Schematron schema. “Look at the clean nesting of hierarchical structures, each tag matching its partner as it should. I’ve never seen HTML that looks this good. Clearly what we have here is S-expressions which is certainly nothing new. Babbage invented this back in 1882!”

“S expressions?” queried the technical writer, who was occupied with technical documentation for the project written in DocBook. “I’ve never heard of such a thing. To me, this looks just like a FrameMaker MIF file, though finding the GUI does seem to be taking me awhile.”

And so they argued into the night, none of them willing to give an inch, all of them presenting still more examples to prove their points, none of them bothering to look at the others’ examples.

If the moral of this little story hasn't hit you over the head yet, you can keep reading in the recently posted Preface of Processing XML with Java. As usual, all comments are appreciated.

Monday, May 20, 2002

Adobe's released FrameMaker 7.0, a $799 desktop publishing package for Windows, Mac OS 9 (but not Mac OS X), HP-UX, AIX, and Solaris. This release combines the features of the previously separate FrameMaker and FrameMaker+SGML at the lower price of the two. New features include:

  • Ability to import, validate, and export XML files and DTDs for "XML roundtripping"
  • XML namespace support
  • Unicode support for XML
  • Automatic generation of CSS stylesheets for XML files
  • DocBook 4.1, DocBook 4.1.2, and XHTML sample applications included for structured authoring
  • eXtensible Metadata Platform (XMP) support
  • WebDAV support
  • Tagged PDF generation for better document accessibility and logical reflow of documents
  • Windows 2000 accessibility compatibility "including an available high-contrast user interface and an extensive set of keyboard shortcuts"
  • Alternate text descriptions for graphics
  • Automatic association of master pages to pages based on paragraph styles or element tags
  • More flexibility in custom master pages for reordering in any sequence
  • Up to 12 available running header and footer variables for more complex documents
  • Select/Deselect All option in the Import Formats dialog box for simpler importing
  • Improved UNIX font support including TrueType, Opentype, and Type 1 formats
  • Import filters for RTF 1.6 and the "latest versions of Microsoft Office files"
  • Automatic creation of HTML versions of documents through the included Quadralay WebWorks Publisher Standard Edition 7.0 software, plus templates for publishing to the HTML 3.2, XML, Microsoft Reader, and Palm Reader formats
  • SVG support on MacOS, Windows and Solaris

I'd love to hear from anyone who knows from experience whether this release can manage large DocBook documents (or for that matter if anyone from Adobe is reading this and feels like tossing me an eval copy, I'll check it out for myself and report the results here.)

Sunday, May 19, 2002

The W3C Cascading Style Sheets (CSS) Working Group has published one new and three updated CSS Level 3 Working Drafts:

CSS3 module: line

This first public working draft presents a set of CSS line formatting properties. It also includes baseline alignment features as well as related styles like initial line and initial letter effects. The properties defined include:

  • alignment-adjust
  • alignment-baseline
  • baseline-shift
  • dominant-baseline
  • drop-initial-after-adjust
  • drop-initial-after-align
  • drop-initial-before-adjust
  • drop-initial-before-align
  • drop-initial-size
  • drop-initial-value
  • inline-box-align
  • line-height
  • line-stacking
  • line-stacking-ruby
  • line-stacking-shift
  • line-stacking-strategy
  • text-height
  • vertical-align
Syntax of CSS rules in HTML's "style" attribute
HTML provides a style attribute on most elements, to hold a fragment of a style sheet that applies to those elements. One of the possible style sheet languages is CSS. This draft describes the syntax of the CSS fragment that can be used in the style attribute.
CSS3 module: text

This document presents a set of CSS text formatting properties. In addition to what was already existing in CSS 2, many new properties are addressing basic requirements in international context (mostly East Asian and Bidirectional). properties defined in this spec include:

  • all-space-treatment
  • glyph-orientation-horizontal
  • glyph-orientation-vertical
  • hanging-punctuation
  • kerning-mode
  • kerning-pair-threshold
  • line-grid
  • line-grid-mode
  • line-grid-progression
  • letter-spacing
  • line-break
  • linefeed-treatment
  • max-font-size
  • min-font-size
  • punctuation-trim
  • script
  • text-align
  • text-align-last
  • text-autospace
  • text-combine
  • text-decoration
  • text-indent
  • text-justify
  • text-justify-trim
  • text-kashida-space
  • text-line-through
  • text-line-through-color
  • text-line-through-mode
  • text-line-through-style
  • text-overflow
  • text-overflow-ellipsis
  • text-overflow-mode
  • text-overline
  • text-overline-color
  • text-overline-mode
  • text-overline-style
  • text-shadow
  • text-transform
  • text-underline
  • text-underline-color
  • text-underline-mode
  • text-underline-position
  • text-underline-style
  • unicode-bidi
  • white-space
  • white-space-treatment
  • word-break
  • word-break-CJK
  • word-break-inside
  • word-spacing
  • wrap-option
  • writing-mode
CSS TV Profile 1.0
This last call draft defines a subset of CSS Level 2 and CSS3 module: Color specifications tailored to the needs and constraints of TV devices. Comments are due by June 14.
Saturday, May 18, 2002

jCatalog Software AG XSLfast 1.0, a €395 graphical editor for XSL Formatting objects documents that supports mail merge and form processing.

Friday, May 17, 2002

Opera Software has released version 6.0.2 of their namesake Opera web browser for Windows. Opera supports direct display of XML with CSS stylesheets. XSLT is not supported. This release mostly speeds things up and fixes a few bugs. However it does add one major new feature, an SMS (Short Messaging Service) panel that lets users send text messages to mobile phones directly from the browser. Currently this only works in Norway, but it will expand to other countries in the future. Opera is $39 payware or free-beer adware.


IPSI-XQ 1.0.1 is a prototype XQuery processor. It includes a parser for the user level syntax, a mapping to the core language, a static and dynamic type checker, a static type inference for the result type and a query evaluation module.


svg-coders is a new mailing list for the more advanced SVG uses dealing with interactivity, animation and server-side SVG applications. To subscribe, send e-mail with the word "subscribe" in the body to svg-coders-request@svg.ilog.fr.


Éric Bellot's released OOo2sDbk, an OpenOffice Writer to simplified DocBook converter. Python 2.1, Java 1.3, and Saxon 6.5.2 are required. OOo2sDbk is published under the LGPL. The documentation is in French.


Yann Dirson's posted sgml2x 0.99.4, a DSSSL formatter based on Jade. This release adds

  • No more pollution with temporary files
  • Automatic production of PDF bookmarks
  • Full usage of DocBook stylesheets without passing extra flags
  • Catches errors not completely reported by (open)jade
  • Uses openjade by default
  • Symbolic names for verbosity levels
  • Renaming of HTML dirs as *-html instead of *.html.
Thursday, May 16, 2002

I'm pleased to announce that I've posted the JDOM Quick Reference, Appendix B of Processing XML with Java here on Cafe con Leche. This appendix contains complete signatures and summaries for all the public classes and interfaces in JDOM. Indeed in a few cases these are more complete than what's in the JDOM JavaDoc.

As usual, I'd much appreciate hearing any comments, criticisms, or corrections you have for this appendix. I'm particularly interested in three issues:

  1. Should this appendix be combined with Appendix A, JAXP Quick Reference? My original plan separated SAX, DOM, JDOM, TrAX, and JAXP factories into separate appendixes which I suppose is also still an option) but I eventually decided to pull the JAXP APIs into one chapter. That leaves JDOM hanging a bit since it's not a component of JAXP. I could just make one bigger appendix that covered them all, which would have the benefit of more parallel structure in the heading levels, and putting the JDOM QuickRef onto a single page for the online version.

  2. Have I accidentally included any deprecated methods? I tried to get them all out, but I may have missed one or two.

  3. Have I forgotten any important exceptions any of the methods might throw? The checked exceptions should all be there, but a lot of them are runtime exceptions and I could have missed these. In a few cases, it's a judgement call. For instance, should the JDOMFactory interface methods declare the same exceptions as the DefaultJDOMFactory class does. (In this case, I deliberately decided not to include them because that's an implementation detail.) This one's particularly important because I plan to go through the Javadoc for the JDOM classes and patch up the @throws clauses based on what I discovered here, so this can have an impact beyond just this book.

Please take a look and let me know what you think. I'm almost done and time is getting short, so if they're any other thoughts on any parts of the book you've been procrastinating about, now is the time to send them. Your comments have been extremely helpful up till now, and are always much appreciated. Thanks!

Wednesday, May 15, 2002

Sean Russell's released  version 2.3.3 of REXML, an open source XML parser for Ruby. This release adds a new PullParser API, a SAX2 streaming parser API, speed optimizations, bug fixes, and filters. REXML is distributed under the Ruby license.


Jaxe is an open source graphical XML editor written in Java aimed at narrative documents. The installation process is quite unpolished, and judging by the screenshots the user interface is in French (though intelligible to an English speaker), but at least this isn't yet another tree-based editor.


TM4J 0.7.0 beta 1 is an open source topic map processing toolkit for Java as well as a set of topic map processing tools. Topic maps are an ISO standard for the interchange of information structures which can be used to represent ontologies, business data and processes, individual knowledge and opinions, and more. This engine processes files conforming to the XML Topic Maps (XTM) specification and stores them either in memory or in a persistent store, providing access via a Java API. Version 0.7.0 adds an extensible indexing system, an implementation of the tolog topic map query language, overridable change notifications, and improved XTM 1.0 Annex F conformance.


Jochen Wiedmann's released JaxMe 1.2.9, "yet another open source Java/XML binding tool in the style of Castor or Zeus" that sits on top of SAX2. It's based on a reduced subset of the W3C XML schema language that does not support choices, references, or recursion. It does support JDBC mapping to an SQL table and reading from joined tables.

Tuesday, May 14, 2002

I'm pleased to announce that I've posted the JAXP Quick Reference, Appendix A of Processing XML with Java here on Cafe con Leche. This appendix contains complete signatures and summaries for all the public classes and interfaces in the various APIs that make up JAXP including:

  • DOM Level 2
  • SAX 1
  • SAX 2
  • TrAX
  • javax.xml.parsers

As always your comments are appreciated. The end is in sight. Just one more appendix and a preface to go. With a little luck I may finish the first draft this week.

Monday, May 13, 2002

Limewire 2.4.3, the open source Gnutella client has been released, which wouldn't normally qualify as news for Cafe con Leche, except that it has a really annoying incompatibility with Xerces 2.0.1 (as did 2.2.3 before it) and since bug reports to Limewire appear to vanish into the ether, and I'm hoping someone who reads this might be able to fix it. I suspect, though I don't know for sure, that one of two things is happening. Either Limewire's DOM code that depends on the Xerces implementation classes directly or it's serializing a DOM using Java object serialization instead of plain vanilla XML. Either way the bug would tie it to a specific version of Xerces. Or it could be something more subtle, like writing code that depends on bugs in particular versions of Xerces. I'm not sure, but I really do wish somebody would fix this! For the moment, I'm stuck using AudioGalaxy which has a much broader selection of files, but a truly abysmal user interface.


Andy Clark's updated his NekoHTML open source HTML parser to version 0.6 According to Clark,

NekoHTML is a simple HTML scanner and tag balancer that enables application programmers to parse HTML documents and access the information using standard XML interfaces. The parser can scan HTML files and "fix up" many common mistakes that human (and computer) authors make in writing HTML documents. NekoHTML adds missing parent elements; automatically closes elements with optional end tags; and can handle mismatched inline element tags.

NekoHTML is written using the Xerces Native Interface (XNI) that is the foundation of the Xerces2 implementation. This enables you to use the NekoHTML parser with existing XNI tools without modification or rewriting code.

Changes in this release include

  • Custom document filters can be appended to the default NekoHTML parser pipeline
  • Filters for serializing HTML documents and removing elements from the document event stream
  • Experimental functionality to allow applications to dynamically insert content into the HTML document stream;
  • A minimal Xerces2 Jar file containing just the files required for using the HTMLConfiguration class directly to alleviate full dependence on Xerces2 distribution;
  • Bug fixes
Sunday, May 12, 2002

Sean Russell's released  version 2.3.0 of REXML, an open source XML parser for Ruby. This release adds support for internal entities and some speed-ups. You can use it under either the Ruby license or the GPL.

Saturday, May 11, 2002

The second release candidate of Mozilla 1.0 has been posted for the usual batch of platforms (Mac, Windows, Linux, Solaris, OpenVMS, et al). No major new features in this release, but lots of bugs have been fixed. This has been my default browser on Windows for almost a year now, but the Macintosh version still has at least one show stopping bug in AppleScript that keeps me from using it regularly.

Friday, May 10, 2002

The Apache XML Project has posted the second beta of Batik 1.5, an open source Scalable Vector Graphics (SVG) viewer, renderer, and converter based on the Java 2D API. Version 1.5 is faster, has much better scripting and DOM support, and is more compatible with Mac OS X.


Claudio Tasso's XPointerAPI 1.1.1 is a GPL'd Java XPointer library based on Xalan. Maybe with this in hand, I'll finally be able to add XPointer support to my XInclude engines.


Steve Meyfroidt's txt2xml 1.2 is an open source Java library that can parse structured text into well-formed XML which is then output as SAX, DOM, JDOM, or a stream.

Thursday, May 9, 2002

Michael Fuchs has posted version 0.2.2 of his DocBook Doclet that creates DocBook SGML and XML documents from JavaDoc. This release adds support for DocBook elements caution, important, note, tip, and warning.


The first milestone build of OpenOffice for Mac OS X has been posted. This build uses the X11 Windowing System from XFree86.org to run on either Mac OS X or Darwin. This is a developer release suitable for programmers who want to help finish the port. OpenOffice is a complete open source office suite that saves its documents in gzipped XML.


Andy Clark's updated his NekoHTML open source HTML parser to version 0.5 According to Clark,

NekoHTML is a simple HTML scanner and tag balancer that enables application programmers to parse HTML documents and access the information using standard XML interfaces. The parser can scan HTML files and "fix up" many common mistakes that human (and computer) authors make in writing HTML documents. NekoHTML adds missing parent elements; automatically closes elements with optional end tags; and can handle mismatched inline element tags.

NekoHTML is written using the Xerces Native Interface (XNI) that is the foundation of the Xerces2 implementation. This enables you to use the NekoHTML parser with existing XNI tools without modification or rewriting code.

Changes in this release include

  • Fixed some location reporting information bugs;
  • Added feature to report character boundaries of events via the associated augmentations object;
  • Added feature to disable tag balancing
  • Added features to notify handlers of start and end of character and built-in XML and HTML entity references.

TM4J 0.6.2 is an open source topic map processing toolkit for Java as well as a set of topic map processing tools. Topic maps are an ISO standard for the interchange of information structures which can be used to represent ontologies, business data and processes, individual knowledge and opinions, and more. This engine processes files conforming to the XML Topic Maps (XTM) specification and stores them either in memory or in a persistent store, providing access via a Java API.

Wednesday, May 8, 2002

I am very pleased to announce that I have posted The JDOM Model, Chapter 15 of Processing XML with Java and most importantly the last chapter in the book. There's still a preface and a couple of appendixes to be written, but at this point the main body of the book is complete. This is now a complete introduction to writing Java programs that read, manipulate, search, query, and output XML documents. I think this book is more up-to-date and more complete on these matters than any other book on the market. It covers DOM, SAX, JDOM, TrAX, JAXP, and a lot more. The actual paper version should be out in a couple of months

The latest chapter focuses on the core node classes in the org.jdom package: Element, Attribute, Text, etc. It discusses the methods of each one in detail and shows you how to use it. It introduces filters and many other techniques for navigation. And as with all the chapters, it points out the rough spots where JDOM can burn you if you aren't careful. This chapter's far and away the most up-to-date coverage of the very latest JDOM version you can get anywhere. It is essential reading for anyone who's using or considering using JDOM.

Tuesday, May 7, 2002

I opened up my e-mail last night to find a huge number of messages from various automated anti-virus scanners complaining about an infected message I allegedly sent to the xml-interest mailing list, which was a bit of a surprise since I generally practice safe computing (that is, I avoid Outlook like the plague, which is, now that I think about it, a very appropriate cliche.)

On further investigation, I tend to doubt that I actually sent the message. Instead it seems to be the result of the W32.Klez worm infecting the system of somebody who is careless enough to have Outlook installed on their system. Apparently, this worm randomly chooses a From address from the e-mail addresses it finds on the local system and uses that, rather than using the infected host's actual address. On this occasion, it happened to choose me. I suspect inspection of the actual headers of the original message would reveal the true culprit.

To compound matters, the worm sent the message to a mailing-list that doesn't require subscription in order to post so the message got forwarded to 2,125 subscribers, some of whom were running filters that blocked the message (good) and automatically notified me about the virus-laden e-mail I didn't actually send them (bad).

What can be done to stop this in the future? I can think of a few things:

  • Mailing lists should disallow posts with attachments. They should most especially disallow posts with attachments from non-subscribers.
  • Antivirus filters should stop assuming that they know who actually sent the message to them, at least without much more sophisticated heuristics to figure out whether or not the From address is forged.
  • Individuals should stop using Outlook. There's really no excuse for this. In 2002, running Outlook is like smoking a big stinky cigar in a crowded subway car full of asthmatics. It pollutes the environment and is actively hostile to your fellow human beings. If you still have Outlook on your system, delete it. Delete it now. And while you're at it, it wouldn't hurt to get rid of Word, Office, Windows, and any other Microsoft software within reach.

Here are a few e-mail programs that are better than Microsoft Outlook:

Bruce Eckel wrote in to recommend Calypso, which you can download from WebAttack's free e-mail clients page. This page also lists numerous other free-beer e-mail programs you can replace Outlook with.

Monday, May 6, 2002

Once again I'm chairing the XML track at Software Development 2002 East in Boston this November 18-22. The call for papers is now live. Tracks include .NET Programming, C++, Java Programming, Project Management, Requirements & Analysis, Scripting Languages, Web Services, Wireless, and XML, though I'm only personally invoked with the XML track. Session types include 90 minute classes, full and half-day tutorials, panels, roundtables, brown bag case studies, birds-of-a-feather gatherings and keynotes. The deadline for submitting abstracts is this Friday, May 10, 2002.

For the XML track, I try to select a broad range of tutorial-focused sessions that cover specific technologies. For example, Intro to Schemas, Intro to XSLT, Overview of XML Security, DOM for Java programmers, etc. The key idea is that the session should be a 90-minute to full-day introduction and how-to session about some specific XML technology that's reasonably well-cooked and can be used today. We find that our audience likes very practical sessions and is not as receptive to bleeding edge technologies (e.g. DOM Level 3, XSLT2) and advanced, research level presentations (e.g. DAML and Quantum Topic Maps, Optimization Schemes for XSLT, SOAP vs. REST). Our audience tends to be Java and C++ professionals who are using some XML rather than fulltime XML hackers. For the XML track, we are especially interested in 90-minute and half-day sessions. If you have any questions, feel free to e-mail me.

Sunday, May 5, 2002

IBM has released the WebSphere Voice Toolkit 2.0 for Windows 2000, an integrated development environment (IDE) for VoiceXML that includes:

  • VoiceXML editor
  • VoiceXML debugger
  • Grammar editor
  • Grammar test tool
  • Pronunciation builder
  • Built-in audio recorder
  • VoiceXML Reusable Dialog Components
  • Speech recognition engine
  • Text-To-Speech engine
Saturday, May 4, 2002

The W3C Resource Description Framework (RDF) Core Working Group has published four new working drafts:

RDF Vocabulary Description Language 1.0: RDF Schema
Quoting from the abstract, "The Resource Description Framework (RDF) is a general-purpose language for representing information in the Web. This specification describes how to use RDF to describe RDF vocabularies. This specification also defines a basic vocabulary for this purpose, as well as conventions that can be used by Semantic Web applications to support more sophisticated RDF vocabulary description."
RDF Model Theory
The abstract states, "This is a specification of a model-theoretic semantics for RDF and RDFS, and some basic results on entailment. This document was written with the intention of providing a precise semantic theory for RDF and RDFS, and to sharpen the notions of consequence and inference."
RDF Primer
According to the abstract, RDF "is particularly intended for representing metadata about Web resources, such as the title, author, and modification date of a Web page, the copyright and syndication information about a Web document, the availability schedule for some shared resource, or the description of a Web user's preferences for information delivery. RDF provides a common framework for expressing this information in such a way that it can be exchanged between applications without loss of meaning. Since it is a common framework, application designers can leverage the availability of common RDF parsers and processing tools. Exchanging information between different applications means that the information may be made available to applications other than those for which it was originally created. This Primer is designed to provide the reader the basic fundamentals required to effectively use RDF in their particular applications."
RDF Test Cases
This document describes a set of machine-processable test cases for RDF though it does not contain the test cases themselves which are available separately.

In addition the W3C's Yves Lafon and Bert Bos have published a note on Describing and retrieving photos using RDF and HTTP.

Friday, May 3, 2002

The XUpdate working group has posted Lexus XML:DB 0.2.2, the pure Java reference implementation of XUpdate. This release fixes some namespace bugs and supports JAXP.


IBM's alphaWorks has updated ToXgene, a "template-based generator for complex, semantically-correlated collections of XML documents. The data generation process in ToXgene is based on a conceptual description of the data to be generated (the templates). This tool is intended for cases in which the structure of the data to be generated is known, the data is required to conform to that structure, and multiple collections of documents, with varying structures, sizes and complexities, can easily be generated." This version adds support for recursive XML content, improved error-reporting, and seeding of random generators.


IBM's alphaWorks has also updated their XML Security Suite, a Java library that allegedly supports XML encryption and XML digital signatures. I found the previous version to be completely non-functional and essentially unusable. Hopefully, this release is more reliable.


Finally alphaWorks has updated their XML for C++ parser to version 4.0.1. This release fixes some thread-safety problems, corrects a memory leak in IDOM, fixes some DOMString problems with Asian code pages, fixes a bug in Base64, and uses the International Clases for Unicode 2.0.2, as well as many other bug fixes.


Andy Clark's updated his NekoHTML open source HTML parser. According to Clark,

NekoHTML is a simple HTML scanner and tag balancer that enables application programmers to parse HTML documents and access the information using standard XML interfaces. The parser can scan HTML files and "fix up" many common mistakes that human (and computer) authors make in writing HTML documents. NekoHTML adds missing parent elements; automatically closes elements with optional end tags; and can handle mismatched inline element tags.

NekoHTML is written using the Xerces Native Interface (XNI) that is the foundation of the Xerces2 implementation. This enables you to use the NekoHTML parser with existing XNI tools without modification or rewriting code.

Changes from the previous release include various bug fixes and support for location information.

Thursday, May 2, 2002

The W3C Web Services Architecture Working Group has published the initial working drafts of Web Services Architecture Requirements and Web Service Description Requirements. One of the more interesting pieces in these drafts is the following definition:

A Web Service is a software application identified by a URI [IETF RFC 2396], whose interfaces and binding are capable of being defined, described and discovered by XML artifacts and supports direct interactions with other software applications using XML based messages via internet-based protocols.

That's useful, given that web services are one of those technologies whose exxact definition seems to change depending on which vendor is trying to sell you what when.

However, this is all very early, and these drafts do not yet reflect a consensus within the working group so this is all still up in the air, and some major pieces are almost certain to change over the next year.


The W3C XSLT and XQuery working groups have posted six new working drafts:

Michael Kay's released SAXON 7.1, a partial and experimental implementation of the XSLT 2.0 working draft.


The W3C SVG Working Group has posted candidate recommendations of the Scalable Vector Graphics (SVG) 1.1 Specification and Mobile SVG Profiles: SVG Tiny and SVG Basic. SVG Tiny is a stripped down version of SVG for cell phones. SVG Basic is a slightly larger version of SVG for PDAs. The big change since the last call working drafts is that the W3C XML Schema Language schemas have been removed. Apparently, they weren't modular enough. Only a DTD is provided. Comments on both specs are due by June 23.


MayuraDraw 4.0 is a $25 shareware Windows drawing program that can import Adobe Illustrator, Windows metafile, GIF, JPEG, PNG, TIFF and BMP formats and export Sclabale Vector Graphics (SVG), as well as EPS, PostScript, Illustrator, PDF, WMF, GIF, JPEG, PNG, BMP and TIFF.

Wednesday, May 1, 2002

Peter Wainwright has released a SVG::Parser 0.97, an XML parser for SVG within the CPAN SVG module framework. This module enables developers to read an existing SVG document, parse it, and then use the SVG Perl module to modify the file contents. You can download it thorugh CPAN.


The Apache Jakarta Project has posted the first beta of JXPath 1.0, an open source XPath interpreter for Java. JXPath applies XPath expressions to graphs of objects of all kinds: JavaBeans, collections, arrays, maps, servlet contexts, DOMs, and mixtures thereof.


OpenOffice 1.0, a large open source suite of productivity applications (word procesing, spread sheet, drawing tool, presentation software, etc.) for Windows, Solaris, and Linux, has been released. OpenOffice file formats are all native XML that has been gzipped. OpenOffice will allegedely read and write Microsoft Office file formats, but I won't believe that until I've had a chance to prove it on my own files. The OpenOffice web site is a little overwhelmed right now (Wednesday morning) by the release 1.0 traffic. You may want to wait a day or two before checking this out.


Norm Walsh has launched a WikiWikiWeb for DocBook.


Michael Fuchs has posted version 0.21 of his DocBook Doclet that creates DocBook SGML and XML documents from JavaDoc. This release adds support for the rowspan attribute of td and th tags.

Tuesday, April 30, 2002

Aleksey Sanin's XML Security Library (current version 0.0.5) is an open source C library based on libxml2 and OpenSSL that supports

  • XML Signature
  • XML Encryption
  • Canonical XML
  • Exclusive Canonical XML

Daniel Veillard's posted  libxslt 1.0.17 and libxml 2.4.21, the Gnome Projects's XSLT and XML libraries respectively. Both include various code clean-ups, small bug fixes, and protability improvements. libxml adds some initial code for supporting the W3C XML Schema Language. However, it "is unstable and not compiled by default."


Microsoft has posted service pack 1 for its MSXML 4 parser. Most of the bug fixes in this release relate to schema vvalidation. There are also a few that affect DOM and XSLT compliance. There may also be some speed-ups.

Monday, April 29, 2002

The W3C Core Working Group has published another working draft of XML 1.1. This draft is some improvement relative to the previous draft. In particular, the C0 control characters like bell, vertical tab, and formfeed have been forbidden as they are in XML 1.0. (That's one bullet dodged.) However, this draft still allows many name characters that should be forbidden including some very weird characters like ©, ±, 7 (&0x2077;, superscript 7), the musical symbol for a six-string fretboard, and the zero-width space. It also allows private-use characters and undefined characters, both of which will produce non-interoperable documents.

Furthermore, XML 1.1 introduces extra line ending characters &#x85 and &#x2028. The former is used on IBM mainframes that are stuck in the 1970s. The latter is a theoretical character that is not actually used in practice today. Both will cause significant problems when XML documents are edited in many existing tools, especially text editors like emacs.

The second big change in this draft is a requirement that XML 1.1 documents be placed in character normalized form. In brief, this requires that no names, text sections, attribute values, and so forth begin with a combining character such as combining accent acute. These characters can still be used. They just can't be the first character (which makes sense because they're always combined with the character that precedes them). Furthermore (and here's the hard one for implementors) you're no longer allowed to use character sequences like &0x45;&0x0301; (e - combining accent acute) when an equivalent combined form exists. Instead you must use the precombined form &0xE9. On the other hand, you could use &0x6B;&0x0301; (k - combining accent acute) because there's no precombined form for that character.

There's also a probably unintentional interaction between the XML 1.1 draft and the Character Model. XML 1.1 requires that all documents be in include normalized form. However, that requires that all includes be resolved first, just to do well-formedness checking. According to the character model, "An include is an instance of a syntactic device specified in a language to include an entity at the position of the include, replacing the include itself. Examples of includes are entity references in XML, @import rules in CSS and the <xsl:include> element in XSLT. Character escapes are a special case of includes where the included entity is a single character." XIncludes aren't mentioned, but they're clearly implied as well. Non-validating processors can no longer ignore external entities of whatever kind. They must be resolved before well-formedness checking can be completed.

The bottom line: despite the small improvements in this working draft, XML 1.1 is still a very bad idea at its core. It will wreak massive havoc with the installed base of XML processing and text editing software in exchange for a very limited benefit for a very small number of users. Even if you think the benefits outweigh the massive costs, the changes proposed here are far from the minimal set required to achieve those benefits. This specification should be rejected completely.


Version 1.0.8 of the XMLPULL API has been released. New functions include:

  • getAttributeType(index)
  • isAttributeDefault(index)
  • setInput(InputStream is, String inputEncoding)
  • getInputEncoding()
  • nextText() replaces readText
  • nextTag()
Sunday, April 28, 2002

Bob McWhirter's released Jaxen 1.0, an open source, model-independent XPath engine for Java that supports DOM, JDOM, dom4j, and ElectricXML. This is not quite a complete implementation of XPath 1.0, but it's close. An earlier beta of Jaxen is discussed in Chapter 16 of Processing XML with Java. A few important class names have changed in this release, so I'll be updating that chapter soon.

Saturday, April 27, 2002

Fabio Giannetti's posted WH2FO: 0.3.0, open source Java application that transforms Microsoft Word HTML output to an XML content file and an XSL stylesheet file that produces XSL Formatting Objects (XSL-FO).


Enrico Schnepel's posted version 0.4.2 of html2fo, a GPL'd HTML to XSL Formatting Objects converter written in C.

Thursday, April 25, 2002

Sun's posted the second proposed draft specification of the Java API for XML - Based RPC in PDF and HTML format.


Bob McWhirter's posted the first release candidate of Jaxen 1.0, an open source, model-independent XPath engine for Java that supports DOM, JDOM, dom4j, and ElectricXML. This is not quite a complete implementation of XPath 1.0, but it's close. An earlier beta of Jaxen is discussed in Chapter 16 of Processing XML with Java. A few important class names have changed in this release, so I'll be updating that chapter soon.

Wednesday, April 24, 2002

The W3C has updated the requirements document for Scalable Vector Graphics 1.1 and beyond. SVG 1.1/1.2/2.0 Requirements now lays out a rough map for the next three versions of SVG as follows:

  • SVG 1.1 is a modularized version of SVG 1.0, including errata from SVG 1.0 and the minimum number of new features required to develop an SVG profile for mobile devices
  • SVG 1.2 is an incremental upgrade to SVG 1.1 that will add only the most needed and most requested new features to SVG.
  • SVG 2.0 will be the next major upgrade that adds big new features.

I wish some other working groups, XSLT in particular, could take this sort of more incremental approach to upgraqdes.


Martin Klang's posted an alpha release of TagBox, an open source XPath 1.0 engine for Java.

Tuesday, April 23, 2002

August Mueller's released GMXMLParser 1.1, a $24.95 XML parser for REALbasic based on James Clark's expat.

Monday, April 22, 2002

I'm speaking at the Software Development 2002 West conference in San Jose this week so updates may be a little slow until I get back to New York next week. In the meantime, you can check out the notes for my four presentations:

Saturday, April 20, 2002

The first release candidate of Mozilla 1.0 has been posted for the usual batch of platforms (Mac, Windows, Linux, et al). RC1 implements one-button publishing in Mozilla Composer, reorganizes the menu bar and context menus for improved usability, implements LDAP over SSL, implements Mail Return receipts, and adds a new Download Manager. Mozilla supports XML, XHTML, HTML, CSS, XSLT, MathML, and more. My absolute favorite feature is the ability to turn of pop-up ads. This has been my default browser on Windows for almost a year now, but the Macintosh version still has at least one show stopping bug in AppleScript that keeps me from using it regularly.

In related news, Patrick C. Beard's posted the first final candidate build of the Macintosh Runtime for Java plug-in for Mozilla/Netscape 6.x on MacOS X.

Friday, April 19, 2002

I've posted JDOM, Chapter 14 of Processing XML with Java, on Cafe con Leche. This chapter introduces JDOM, an open source, pure Java, tree-based API for processing XML that's much simpler than DOM. This chapter covers the basic design of JDOM, as well as parsing and serializing XML documents with JDOM. As always, all comments are appreciated.

Thursday, April 18, 2002

Jez Higgins's  SAX in C++ is a set of SAX2 bindings for C++ that includes SAX2 wrappers for expat, libxml, Xerces, and MSXML.


Opera Software has posted the second beta of Opera 6.0 for Linux, a web browser with built-in support for direct display of XML styled with CSS. Beta 2 improves support for non-Roman alphabets. Opera 6.0 is adware/$39 payware (your choice).


Lucid'i.t. has posted the second alpha release of the Lucid XML Toolkit Personal Edition, a schema and DTD val;idating parser that supports SAX2 and DOM2.

Wednesday, April 17, 2002

The W3C has released the final recommendation of the Platform for Privacy Preferences 1.0 specification. According to the introduction:

The Platform for Privacy Preferences Project (P3P) enables Web sites to express their privacy practices in a standard format that can be retrieved automatically and interpreted easily by user agents. P3P user agents will allow users to be informed of site practices (in both machine- and human-readable formats) and to automate decision-making based on these practices when appropriate. Thus users need not read the privacy policies at every site they visit.

Although P3P provides a technical mechanism for ensuring that users can be informed about privacy policies before they release personal information, it does not provide a technical mechanism for making sure sites act according to their policies. Products implementing this specification MAY provide some assistance in that regard, but that is up to specific implementations and outside the scope of this specification. However, P3P is complementary to laws and self-regulatory programs that can provide enforcement mechanisms. In addition, P3P does not include mechanisms for transferring data or for securing personal data in transit or storage. P3P may be built into tools designed to facilitate data transfer. These tools should include appropriate security safeguards.

The W3C P3P Specification Working Group has also published a new working draft of A P3P Preference Exchange Language 1.0 (APPEL1.0). According to the abstract, "This document complements the P3P1.0 specification [P3P10] by specifying a language for describing collections of preferences regarding P3P policies between P3P agents. Using this language, a user can express her preferences in a set of preference-rules (called a ruleset), which can then be used by her user agent to make automated or semi-automated decisions regarding the acceptability of machine-readable privacy policies from P3P enabled Web sites."


MXP1 is an open source XML pull parser that implements the XMLPULL API.

Tuesday, April 16, 2002

Oracle's released version 9.2.0.2 of their XML Developer Kits for Java, JavaBeans, C, C++. and PL/SQL. This release fixes assorted bugs. Registration is required.


Andy Clark's updated his NekoHTML open source HTML parser. According to Clark,

NekoHTML is a simple HTML scanner and tag balancer that enables application programmers to parse HTML documents and access the information using standard XML interfaces. The parser can scan HTML files and "fix up" many common mistakes that human (and computer) authors make in writing HTML documents. NekoHTML adds missing parent elements; automatically closes elements with optional end tags; and can handle mismatched inline element tags.

NekoHTML is written using the Xerces Native Interface (XNI) that is the foundation of the Xerces2 implementation. This enables you to use the NekoHTML parser with existing XNI tools without modification or rewriting code.

Changes from the previous release include:

  • Properties to control the case of element and attribute names
  • Only known HTML elements have their names modified according to the properties
  • New property to set the default encoding
  • A feature to augment infoset to report "synthesized" events;
  • A feature enmable error reporting with localized error messages
  • Location information can be reported
  • More elements are properly scanned as "special".
  • More documentation

Yuval Oren has released Piccolo 1.0, an open source XML parser for Java that supports SAX1, SAX2 extensions 1.0, and JAXP 1.1 (SAX parsing only). It is published under the GNU Lesser General Public License (LGPL).

Saturday, April 13, 2002

Michael Kay's released version 6.5.2 of Saxon, my XSLT processor of choice. Saxon is open source under the Mozilla Public License. Java 1.1 or later is required. This release fixes assorted bugs, but adds no new features.

Friday, April 12, 2002

The W3C DOM Working Group has posted two new working draft for DOM Level 3:

I have to read through these more carefully, but there don't seem to be any fundamental changes since the last drafts. (e.g. the abstract schemas model still only really works for DTDs and the W3C XML Schema Language, not RELAX or Schematron.) However, a lot of the method signatures have been changed, along with other details. The biggest changes are in the abstract schemas API.

Thursday, April 11, 2002

I've posted the first draft of XSLT, Chapter 17 of Processing XML with Java on Cafe con Leche. This chapter includes three major sections:

  • A brief XSLT tutorial
  • Detailed discussion of the TrAX API for interating XSLT with Java
  • Writing XSLT extension functions and elements in Java

I'm particularly pleased with the first section. I've written several XSLT tutorials before, but this one is radically different. It considers XSLT primarily as a functional programming language and focuses on the ability to call templates recursively. I doubt there's anything here that hasn't been discovered or invented by someone somewhere before, but certainly I had never seen some of the things you could do with XSLT until I invented them for this chapter. I wouldn't recommend this as your first exposure to XSLT (for that see Chapter 17 of the XML Bible ) but if you're already familiar with XSLT basics, this chapter may show you a few new tricks.

The end is in sight. I've just got a couple more quick chapters about JDOM to write and a few appendixes and I'll be done. This book may yet see print this Spring as planned. Keep your fingers crossed. As usual all comments and criticisms are appreciated.

Wednesday, April 10, 2002

Aleksander Slominski and Stefan Haustein have released the XMLPULL API 1.0, a Common API for XML Pull Parsing. At a high level, pull parsing is similar to SAX push parsing in that the parts of a document are presented to a program sequentially, one at a time, in the order they appear in the document. Thus it's fairly memory efficient. However, it differs from SAX in that the client program must explicitly request the next part when it's ready for it rather than having it automatically pushed to it.

This is XMLPULL is a minimalist API derived from kXML and XPP. It can be implemented from scratch or on top on top of existing XML parsers. Like SAX, XMLPULL is in the public domain. Two implementations of the XMLPULL API are currently available: kXML2 and XPP3.


Jochen Wiedmann's released JaxMe 1.2.3, "yet another open source Java/XML binding tool in the style of Castor or Zeus" that sits on top of SAX2. It's based on a reduced subset of the W3C XML schema language that does not support choices, references, or recursion. It does support JDBC mapping to an SQL table and reading from joined tables. This release adds support for log4j.

Tuesday, April 9, 2002

Michael Fuchs has posted version 0.18 of his DocBook Doclet that creates DocBook SGML and XML documents from JavaDoc. This release adds support for the colspan attribute of td and th tags as well as sample property files.

Monday, April 8, 2002

Sean Russell's released REXML 2.1.0, an open source XML parser written in and for the Ruby programming language. This version includes adds a number of small fixes, optimizations, API changes, and new features including ISO-8859-1 output.

Sunday, April 7, 2002

Andy Clark's updated his NekoHTML parser for Xerces2. This release "fixes a few bugs and adds some convenient DOM and SAX parser classes so it's a little easier to use directly."

Saturday, April 6, 2002

The W3C Voice Browser Working Group has posted a new public working draft of the Speech Synthesis Markup Language Specification. According to the abstract:

The Voice Browser Working Group has sought to develop standards to enable access to the web using spoken interaction. The Speech Synthesis Markup Language Specification is part of this set of new markup specifications for voice browsers, and is designed to provide a rich, XML-based markup language for assisting the generation of synthetic speech in web and other applications. The essential role of the markup language is to provide authors of synthesizable content a standard way to control aspects of speech such as pronunciation, volume, pitch, rate and etc. across different synthesis-capable platforms.
Friday, April 5, 2002

IBM's alphaWorks has released version 4.0.1 of their XML Parser for Java. This release is based on Xerces 2.0 and supports

  • W3C XML Schemas
  • SAX1 and SAX2
  • DOM Level 1, DOM Level 2, and some experimental features of DOM Level 3 Core, Abstract Schema and Load/Save Working Drafts
  • JAXP 1.1

HotSAX 0.1 is a non-validating SAX2 parser for HTML/XML/XHTML. It has the unique ability to parse malformed HTML through a SAX interface.

Thursday, April 4, 2002

The W3C XML Core Working Group has published the first public working draft of Namespaces in XML 1.1 Requirements. This adds a single new feature - the ability to undeclare namespaces (Particularly useful for the default namespace, and also useful when serializing the results of XInclude processing.) The plan is to tie Namespaces 1.1 to XML 1.1, so that only XML 1.1 documents can use Namespaces 1.1 syntax.

Saturday, March 30, 2002

I'm travelling for the next week. Updates will probably be a little slow until Thursday.


Build 641d of OpenOffice, an open source office suite for Windows and Linux that uses XML as its native file format, has been posted. This is possibly the last release before OpenOffice 1.0.

Friday, March 29, 2002

Beta 8 of JDOM, an open source class library for processing XML with Java, has been posted. JDOM uses a tree-based model that is roughly similar in structure to DOM, but much, much simpler. Beta 8 changes a lot of signatures since beta 7, and most programs will need to be rewritten to account for API changes. Changes since beta 7 include:

  • A new Text class to represent text nodes.
  • Filter lists
  • More complete checking of well-formedness constraints when creating new content
  • Various new convenience methods
  • Internal DTD subset support
  • The ability to configure SAX builders with features and properties
  • Attributes now know their type
  • Documents can be rootless (a very bad idea IMO)

And of course there are many bug fixes and performance optimizations throughout the code.

Thursday, March 28, 2002

The W3C DOM Working Group has posted a last call working draft of Document Object Model (DOM) Level 3 XPath Specification. At first glance the changes since the last draft seem quite minor. Now I have to update Chapter 16 of Processing XML with Java. I have to say I'm not very thrilled by this API, but at least it's not as abysmal as DOM 3 Abstract Schemas.


The W3C XML Query and XSL Working Groups have posted a new working draft of XQuery 1.0 Formal Semantics. This has been overdue for some time now. Hopefully this draft will clear up a lot of confusion over XQuery. I'll be talking about this in my XPath 2.0 and Beyond seminar at Software Development 2002 West next month.


Unicode 3.2 has been released. Version 3.2 adds 1,016 additional characters including many new characters for mathematical and technical publishing, four indigenous scripts of the Philippines, recycling symbols, and Khmer. These characters are legal in XML documents immediately, although few fonts are available and most application software won't recognize them.

Wednesday, March 27, 2002

Rajiv Mordani, the maintenance lead for JSR-63 Java API for XML Processing, has posted a change log for maintenance review. Comments are due by April 22, 2002. I've been noticing a lot of incomplete specifications all over the javax.xml packages lately, none of which are yet addresed by this document. I'm submitting them to the comments address. I'm curious to see what will happen.


The Apache Project has released Cocoon 2.0.2, "an XML framework that raises the usage of XML and XSLT technologies for server applications to a new level. Designed for performance and scalability around pipelined SAX processing, Cocoon offers a flexible environment based on the separation of concerns between content, logic and style. A centralized configuration system and sophisticated caching top this all off and help you to create, deploy and maintain rock-solid XML server applications." Version 2.0.2 is "a maintainance release focusing on improved performance and robustness. In addition some bugs were fixed and new features were added." New features include:

  • A BootstrapServlet that lets Cocoon run in non-compliant servlet engines that don't handle correctly servlet contexts.
  • Error reporting includes the line, column and location attributes specified in SAXException and TransformerException.
  • New "set-content-length" configuration for FOPSerializer to allow streaming of large PDFs
  • New POI HSSF Serializer; outputs to the .xls (not .xsl) file format.
  • New module structure for input, output, and database specifica in scratchpad. Thus it is possible to write generic components for one task and replace input and output dynamically.
  • A new Jisp based persistence cache, to improve (1) performance and (2) to solve the problem with long filenames on Windows OS flavours.
  • An encodeURL transformer for encoding URIs.
  • Javascript IS now supported by the XSP pages.

Randy J. Ray's RPC::XML 0.37 is a set of Perl classes for XML-RPC clients and servers.

Tuesday, March 26, 2002

BEA Systems has submitted an initial Java Specification Request (JSR) to the Java Community Process for a Streaming API for XML (StAX). This JSR proposes a Java-based, pull-parsing API for XML using "a simple iterator based API. This allows the programmer to ask for the next event (pull the event) and allows state to be stored in a procedural fashion." Comments are due by April 1.


The W3C Key Management Working Group has updated three working drafts:

According to the Requirements intro, "XML-based public key management should be designed to meet two general goals. The first is to support a simple client's ability to make use of sophisticated key management functionality. The second is to provide public key management support to XML applications that is consistent with the XML [XML] architectural approach. In particular, it is a goal of XML key management to support the public key management requirements of XML Encryption [XML Encryption] and XML Digital Signature [XMLDSIG] and to be consistent with the Security Assertion Markup Language [SAML]. This specification provides requirements for XML key management consistent with these goals."


Daniel Veillard 's updated the libxml2 XML C library to version 2.4.19. Libxml supports XML 1.0, Namespaces, XML Base, XPath, XPointer, HTML4, XInclude, SGML Catalogs, and XML Catalogs. Version 2.4.19 fixes some bugs in XPath, validation and the UTF8 encoder. It also adds better makefiles for Windows and improves portability somewhat.

He's also released libxslt 1.0.15, an XSLT C library for Linux, Unix, and Windows. 1.0.15 fixes some bugs in XPath, attribute sets, and template matching rules. It also adds better makefiles for Windows and improves performance a little.

Monday, March 25, 2002

The W3C RDF Core Working Group has published the first public working draft of RDF/XML Syntax Specification (Revised). According to the abstract, this "specification defines an XML syntax for the Resource Description Framework (RDF) as amended and clarified by the RDF Core Working Group from that originally described in RDF Model & Syntax. The syntax is updated to be specified in terms of the XML Information Set with new support for XML Base. For each part of the syntax, it defines the mapping rules for generating the RDF graph as defined in the RDF Model Theory. This is done using the N-Triples graph serializing test format which enables more precise recording of the mapping in a machine processable and testable form. These tests are gathered and published in the RDF Test Cases."

The same working group has also published for the first time an RDF Primer. According to its abstract, "The Resource Description Framework (RDF) is a general-purpose language for representing information in the World Wide Web. It is particularly intended for representing metadata about Web resources, such as the title, author, and modification date of a Web page, the copyright and syndication information about a Web document, the availability schedule for some shared resource, or the description of a Web user's preferences for information delivery. RDF provides a common framework for expressing this information in such a way that it can be exchanged between applications without loss of meaning. Since it is a common framework, application designers can leverage the availability of common RDF parsers and processing tools. Exchanging information between different applications means that the information may be made available to applications other than those for which it was originally created. This Primer is designed to provide the reader the basic fundamentals required to effectively use RDF in their particular applications."

Sunday, March 24, 2002

Jochen Wiedmann's released JaxMe 1.2, "yet another open source Java/XML binding tool in the style of Castor or Zeus" that sits on top of SAX2. It's based on a reduced subset of the W3C XML schema language that does not support choices, references, or recursion. It does support JDBC mapping to an SQL table and reading from joined tables. This release adds basic support for simpleType definitions.

Friday, March 22, 2002

Sun's posted the second early access release of the Java Web Services Developer Pack and Java Web Services Tutorial. This includes the Java XML Pack Spring 02 release, the JavaServer Pages Standard Tag Library (JSTL) 1.0 Beta 1, Ant Build Tool 1.4.1, Java WSDP Registry Server 1.0 EA2, Web Application Deployment Tool, and the Apache Tomcat 4.1-dev servlet container. Changes in this release include:

  • JSTL updated to the Public Draft of the specification, including the Expression Language.
  • UDDI test registry updated to UDDI v2
  • DeployTool
  • AdminTool

The Spring 2002 version of the Java XML Pack is also available separately. This bundles various XML-related technologies for Java including the Java API for XML Messaging (JAXM) v1.0.1 EA2, the Java API for XML Processing (JAXP) v1.2 EA2, the Java API for XML Registries (JAXR) v1.0 EA2, and the Java API for XML-based RPC (JAX-RPC) v1.0 EA2. Notable additions to this version of the Java XML Pack include:

  • JAX-RPC implements draft 0.7 of the specification.
  • JAXP has been updated to the latest JAXP 1.2 draft and includes Xalan XSLTC.
  • JAXR now supports UDDI v2.

Note that if you're using this with Java 1.4, you'll need to put the JAR files in your jre/lib/endorsed directory to override the ones bundled with the JDK.


Norm Walsh has posted version 1.50.0 of his XSL stylesheets for DocBook. I use these to format Processing XML with Java, both the HTML and the PDF versions.

Version 1.5 fixes some (though not all) of the problems with synopsis formatting in XSL-FO. However, it introduces soem major issues with program listings in justified text and adds an extra inch or so of top marghin on all of my pages, for reasons I haven't yet determined. The HTML stylesheets now use the em element for emphasis instead of i. Otherwise, they're mostly unchanged. For the moment, I'm sticking to version 1.4.8 and recommend you do so too.

Thursday, March 21, 2002

I've posted XPath, Chapter 16 of Processing XML with Java, here on Cafe con Leche. It covers integrating XPath searches into your programs, a little-known but very powerful technique that should be in every XML developer's toolbox. XPath based programs are often far more robust and reliable than programs that use SAX or DOM to perform tree navigation and searching. XPath searches will often succeed even when the document format is not quite what you expected. For example, a comment in the middle of a paragraph of text may break DOM code that expects to see contiguous text. XPath wouldn’t be phased by this. Many XPath expressions are resistant even to much more significant alterations such as changing the names or namespaces of ancestor elements or adding or subtracting levels from the tree hierarchy. While you could write your SAX/DOM/JDOM programs to handle these cases, it's about a hundred times easier to do it with XPath.

For developers who aren't yet familiar with XPath, this chapter also includes with a brief tutorial on the XPath data model and expression syntax. Although I've written things like this before, this is the first one to treat XPath from the perspective of a Java developer using XPath rather than mainly as a part of XSLT.

Although I'm writing out of order right now, nothing in this chapter depends on anything in the unpublished chapters 14 and 15. If you've read the DOM chapters (or are reasonably familiar with DOM), you know everything you need to know before digging into this one. As usual, I'd like to hear of any comments, criticisms, corrections, caveats, calumnies, and any other C-words you may care to lob in my direction about this chapter.

Wednesday, March 20, 2002

The DocBook Technical Committee has posted a candidate release of DocBook 4.2 in both XML and SGML. Most of the changes are fairly minor additions such as allowing SimpleSects inside Sections and adding a newsgroup class to systemitem. It looks like it should be backwards compatible with existing DocBook 4.1.2 documents.


Xindice 1.0, a native XML database, has been released by the Apache XML Project. It is, of course, open source under the Apache Software License. Xindice supports XPath for queries and XML:DB XUpdate for XML updates and the XML:DB XML database API for Java as well as an XML-RPC interface.


Version 1.5.1 of the AxKit open source, Perl-based XML application server, has been released. AxKit "provides on-the-fly conversion from XML to any format, such as HTML, WAP or text using either W3C standard techniques, or flexible custom code. AxKit also uses a built-in Perl interpreter to provide some amazingly powerful techniques for XML transformation." This is primarily a bug-fix release, and also marks the debut of AxKit as another Apache project.


Design Science has posted a public beta of MathPlayer, a free-beer MathML display plug-in for Internet Explorer 5.5 and later on Windows.


REXML 2.0 is an open source, non-validating XML parser for Ruby that includes incomplete XPath support. The author says he based the design on ElectricXML, which worries me because ElectricXML is easily the least correct Java API I've seen for processing XML. (A lot of novice developers like ElectricXML because it cuts corners to make XML seem less complex than it really is. An API should be as simple as it can possibly be and no simpler.) However, I don't know anything about Ruby, and can't really judge whether ElectricXML's problems have been corrected in REXML or not. REXML is implemented in pure Ruby. It is dual licensed under both the Ruby license and the GPL.


Daniel Veillard 's updated the libxml2 XML parser for Linux to version 2.4.18 and the libxslt XSLT processor for Linux to version 1.0.14. Among other things, this release significantly speeds up processing of some of the DocBook stylesheets.


Michael Fuch's DocBook Doclet 0.16 generates DocBook SGML or XML code from Java source documentation. I should check this out to see if I can use it for a first pass at the reference appendixes of Processing XML with Java, which is being written in DocBook. The DocBook Doclet is published under the GPL.


In other DocBook news, OASIS has published version 1.1 of the DocBook HTML Forms Module. Version 1.1 parameterizes the HTML element names so that the namespace prefix can be changed on a per-document basis.


And in still more DocBook news, the DocBook technical committee has posted version 0.2 of the XML Character Entities draft that defines XML encodings of the standard SGML character entity sets including:

  • Added Latin 1
  • Added Latin 2
  • Greek Letters
  • Monotoniko Greek
  • Russian Cyrillic
  • Non-Russian Cyrillic
  • Numeric and Special Graphic
  • Diacritical Marks
  • Publishing
  • Box and Line Drawing
  • General Technical
  • Greek Symbols
  • Alternative Greek Symbols
  • Added Math Symbols: Ordinary
  • Added Math Symbols: Binary Operators
  • Added Math Symbols: Relations
  • Added Math Symbols: Negated Relations
  • Added Math Symbols: Arrow Relations
  • Added Math Symbols: Delimiters

In addition, this draft suggests a new way to use these characters in XML documents without DTDs through "XML character elements". This is an element named character in the http://www.oasis-open.org/docbook/xmlcharent/names namespace, whose name or entity attribute identifies the character to use. For example,

<doc xmlns:e="http://www.oasis-open.org/docbook/xmlcharent/names">
  <p>
    This document uses the character names element to access
    character entities, such as "<e:char name="eacute"/>" and
    "<e:char name="COPYRIGHT SIGN"/>".
  </p>
</doc>
Tuesday, March 19, 2002

Johannes Dobler's released version 1.2.6 of jd.xslt, an open source XSLT processor written in Java that supports most of the now defunct XSLT 1.1 working draft.


Sun's posted the proposed final draft specification of the Java API for XML Registries (JAXR). JAXR "This version of the JAXR specification includes detailed bindings between the JAXR information model and both the ebXML Registry and the UDDI Registry v2.0 specifications."

Monday, March 18, 2002

Hewlett-Packard's posted a note to the W3C about the Web Services Conversation Language (WSCL) 1.0. According to the abstract, "WSCL allows the abstract interfaces of Web services, i.e. the business level conversations or public processes supported by a Web service, to be defined. WSCL specifies the XML documents being exchanged, and the allowed sequencing of these document exchanges. WSCL conversation definitions are themselves XML documents and can therefore be interpreted by Web services infrastructures and development tools. WSCL may be used in conjunction with other service description languages like WSDL; for example, to provide protocol binding information for abstract interfaces, or to specify the abstract interfaces supported by a concrete service."

Sunday, March 17, 2002

I've posted my notes from the recently concluded XML & Web Services 2002 conference in London. Talks I gave included:

The Advanced XML Programming all-day tutorial contained lots of new material on DOM Level 3, XPath 2.0, and XSLT 2.0 that I hadn't delivered previously. There are also some new notes in the namesapces talk on how namespaces work in SAX, DOM, and JDOM.


Late Night Software has released version 2.4 of its free-beer XML Tools AppleScript scripting addition. This release fixes a number of conformance bugs in earlier versions. Mac OS 8.5/AppleScript 1.3 or later are required.

Saturday, March 16, 2002

Microsoft's posted a service pack for their MSXML 4.0 XML parser/XSLT processor for Windows. This release fixes a number of bugs in the original release of MSXML 4.0.

Friday, March 15, 2002

John Cowan's ported his Itsy Bitsy Teeny Weeny Simple Hypertext DTD (IBTWSH) to the RELAX NG schema language. IBTWSH is a small subset of XHTML Basic, suitable for use in representing static marked-up texts and in embedding into other document types (for documentation, e.g.)


Sun's posted a sample implementation of the XML Pipeline Definition Language:

XML Pipeline is an XML vocabulary for describing the processing relationships between XML resources. A pipeline document specifies the inputs and outputs to XML processes, and a pipeline controller uses this document to figure out the chain of processing that must be executed in order to get a particular result.

The XML Pipeline Definition Language Controller Implementation is a free Ant-based sample implementation of an XML Pipeline controller. This controller implementation can be used to manage validations, transformations, and similar XML processes.

Registration is required.

Thursday, March 14, 2002

The XML Apache Project has posted version 0.20.3 of the FOP XSL formatting objects processor. FOP can display an XSL-FO file in a GUI or convert it ot text, RTF, PDF, or PostScript. The main new feature of this release is conformance to the XSL-FO Version 1.0 W3C Recommendation. Other changes include:

  • Support for CMYK and embedded ICC profiles in JPEG images
  • Support for EPS images
  • Improved font encodings for native (Acrobat) fonts
  • Internationalization improvments in the AWT viewer
  • Support for letter-spacing
  • Polish, Greek, and Hungarian hyphenation

This is still not a complete implementation of XSL-FO 1.0, but it's quite useful. I'm using it to make PDFs of Processing XML with Java. I'll update the online version of Chapter 18 of the XML Bible to cover this release and XSL-FO 1.0 soon.


Version 0.9.9 of Mozilla has been posted for the usual batch of platforms (Windows, Linux, and MacOS with more builds to come). MathML is now supported by default on Windows and Unix, though you do need to install special fonts. The JavaScript debugger can now profile code as well as debug it. TrueType fonts are supported on Unix. LDAP directories are now included in the address book, and many other small new features are added and bugs fixed. Unfortunately, there's still an annoying AppleScript bug that prevents me from using it as my main browser on the Mac, but it's definitely the best browser out there for Windows and Linux. The option to turn off pop-up ads alone leaves all other browsers behind.

Saturday, March 9, 2002

I'm travelling right now so updates may be a little slow until March 14.


The XML Apache Project has released version 1.7.0 of Xerces-C++, an open source validating and schema-validating XML parser written in fairly generic C++ that supports DOM2 and SAX2. New features in this release include:

  • Support for the SAX DeclHandler class
  • Directory sane_include reorganization
  • More IDOM test cases
  • Support IconvFBSD in multi-threading environment
  • Use IDOM in schema processing for faster performance.
  • Project files for Borland C++ Builder 6.
  • Caldera (SCO) OpenServer port
  • Support the new MacOSURLAccessCF NetAccessor
  • Assorted bug fixes, leak fixes and performance improvements
Friday, March 8, 2002

The W3C Web Ontology Working Group has published a working draft of Requirements for a Web Ontology Language. According to the draft, "An ontology formally defines a common set of terms that are used to describe and represent a domain. Ontologies can be used by automated tools to power advanced services such as more accurate Web search, intelligent software agents and knowledge management."

Thursday, March 7, 2002

Sun's posted version 0.8, the proposed final draft, of the Java API for XML-Based RPC (JAX-RPC) specification. The most obvious change is that the name is no longer as likely to be confused with XML-RPC.


Sun's submitted Java Speciifcation Request 172, J2ME Web Services Specification, to the Java Community Process. "The purpose of this specification is to define an optional package that provides standard access from J2ME to web services" including support for XML. Comments are due by March 18.


Michael Sintek and Stefan Decker have released TRIPLE, an open source "RDF query, inference, and transformation language which allows various semantics of Semantic Web languages to be defined, either directly with TRIPLE rules, or by accessing external inference engines (like a DL classifier)." TRIPLE contains a standalone DAML+OIL implementation.


XMLMind has released the XMLmind FO Converter (XFC), an XSL-Formatting Objects to RTF converter written in Java. The personal edition is free-beer. The professional edition with source code is payware. Pricing has not yet been announced.


Joe English has posted HXML version 0.2, a non-validating XML parser written in Haskell. It is designed for space-efficiency, taking advantage of lazy evaluation to reduce memory requirements. Changes in version 0.2 include:

  • New Arrow-based combinator library
  • Support for CDATA sections
  • New function parseDocument recognizes (and ignores) the document prolog
  • XML and DOCTYPE declarations
  • Bug fixes
Wednesday, March 6, 2002

The XML Encryption Working Group has published candidate recommnedations of XML Encryption Syntax and Processing and the Decryption Transform for XML Signature, and wouldn't you know they did this just a couple of days before I'm scheduled to fly out of here to London to talk about this very subject at XML & Web Services 2002? I guess I know what I'm doing on the flight over. At least this beats last year, when the schema working group published a new schema draft the morning of the day I was scheduled to talk about schemas, and I found myself out in the hallway during the preceding presentation hogging the single public Internet terminal at the show, frantically trying to figure out what had changed. After skimming this draft, I haven't spotted any major changes or new features since the October 18 draft, but I still need to read it more closely.


XML Web GUI, currently in its third alpha release, is an open source validating XML editor with a Web interface. It is based on XHTML, JavaScript, DOM and CSS on the client and servlets and Java Server Pages the server.

Tuesday, March 5, 2002

The Apache XML Project has released version 2.0.1 of the popular open source Xerces-J XML parser. This release fixes a number of small bugs in Xerces 2. Notable changes include reporting line and column numbers for schema errors, and fixing an EntityResolver bug that was confusing a lot of programmers.


DecisionSoft has released two open source XML utilities written in Perl:

  • xmlpp pretty prints and indents XML documents.
  • xmldiff compares XML documents for differences

However, xmlpp and perhaps xmldiff suffer from some common novice misconceptions about XML. In particular:

  • White space doesn't matter.
  • Mixed content doesn't exist.

While these are sometimes true of specific XML vocabularies (though much less often than is commonly thought) neither of these is true in general, and utilities that operate on arbitary XML documents should handle this. However, the tools are open source so maybe some Perl XML whiz can fix them. Both use only standard perl modules and therefore should run from almost all Perl installations.

Monday, March 4, 2002

Ejen is a GPL'd text generation system (where text includes programming source code) that uses Java/XML/XSLT technologies and is implemented as an Ant task. "This implementation allows the setup of a complete generation, compilation and deployment sequence, by maintaining only one 'build' file that indicates the order in which each of these actions must be achieved. Generally speaking, this system should be understood as a system that organizes a data flow, whose initial source is an XML file containing a (minimal) set of data (required by the generation process). The data flow grows by the fusion with other XML files and by going through XSL 'filters', until it is sufficiently detailed. It finally goes through XSL 'templates' to produce the resulting text files." A demo of an EJB 1.1 generation process is provided.

Sunday, March 3, 2002

The Apache XML Project has released Xalan-J 2.3.1 to add a JAR that was unintentiuonally omitted from the 2.3.0 distribution files. This JAR is only required for XSLTC, so if you're not using XSLTC, there's no need to upgrade.

Friday, March 1, 2002

Rogue Wave Software has posted the second alpha of Ruplex, "a distributed computing technology built on an Internet tuple space for XML documents, providing a loosely coupled asynchronous and anonymous link between multiple senders and receivers. Ruple provides a foundation for asynchronous, loosely coupled, XML document exchange, allowing for collaborating applications while radically decoupling senders and receivers." "Ruple facilitates secure XML document exchange over the Internet or private networks; its key technical attributes include asynchrony, security, flexible document exchange, identity-based addressing and simplicity." New features in this release include:

  • MIME-based attachments, allowing you to attach binary files to XML documents.
  • A server-side download, letting you host your own space instead of using the Rogue Wave forum.
  • Comprehensive examples

Registration is required for download, and eventual pricing has not been set.


Norm Walsh and Eve Maler of Sun have submitted a note to the W3C on XML Pipeline Definition Language Version 1.0. Accoring to the abstract, "Pipeline is an XML vocabulary for describing the processing relationships between XML resources. A pipeline document specifies the inputs and outputs to XML processes and a pipeline controller uses this document to figure out the chain of processing that must be executed in order to get a particular result." The detailed syntax looks a lot like ant, but its focused on XML rather than Java. It's important to note that this is just a note, and the W3C is not obligated to follow up on it. However, like RDDL and RELAX NG, that doesn't mean third parties can't work together and build tools around this, even if the W3C ignores it.

Thursday, February 28, 2002

Ektron has released eWebEditPro+XML, a payware, browser-based, graphical HTML/XML editor for Windows and Unix. Pricing starts at $599 for ten users.

Wednesday, February 27, 2002

The Apache Xindice team has posted the second release candidate of the open source Xindice 1.0 native XML database (formerly known as dbXML). XIndice supports XPath for queries and XML:DB XUpdate for XML updates. The XML:DB XML database API supports Java and other languages can access it through XML-RPC. (I have to say that part doesn't really make sense to me. It means that XML queries and responses have to be hidden inside #PCDATA. If SOAP were used instead of XML-RPC, then actual XML documents could be passed back and forth.) Changes since the last version are mostly bug fixes.

Tuesday, February 26, 2002

The W3C Patent Policy Working Group has published a working draft of Royalty-Free Patent Policy. This is close to a 180 degrree shift in W3C thinking on the issues. A couple of relevant quotes:

In order to promote the widest adoption of Web standards, W3C seeks to issue Recommendations that can be implemented on a Royalty-Free (RF) basis. Under this policy, W3C will not approve a Recommendation if it is aware that Essential Claims exist which are not available on Royalty-Free terms.

To this end, RF Working Group charters will include as a requirement that the specification produced by the Working Group will be implementable on an RF basis, to the best ability of the Working Group and the Consortium.

This is actually stronger than it sounds because the W3C is using an unusual definition of "royalty-free" that covers not just royalties but any form of payment.

Even more importantly, participation in development of a spec now requires particpants to provide royalty-free patent licenses:

As a condition of participating in a Working Group, each W3C Member and invited expert agrees to make any Essential Claims it controls available on RF terms, as defined in this policy. With the exception of the provisions of section 2.2 below, this licensing commitment is binding on participants for the life of the patents in question, regardless of changes in participation status or W3C Membership.

Participants can exclude specific patents, but only within 60 days of publication of the relevant requirements document or working draft. Furthermore, if an essential patent is excluded a Patent Advisory Group (PAG) will be formed. This group will determine whether:

  1. The initial concern has been resolved, enabling the Working Group to continue.
  2. The Working Group should be instructed to consider designing around the identified claims.
  3. The Team should seek further information and evaluation, including but not limited to evaluation of the patents in question or the terms under which acceptable licensing may be available.
  4. The specification under development should be produced on RAND (reasonable and non-discriminatory) terms, either at W3C or some other body. Note that there is not yet any process for developing or issuing RAND specifications. Therefore if a PAG makes a recommendation to proceed on RAND terms, Advisory Committee review and Director's decision will be required. It is also possible that a PAG could recommend that the work be taken to another organization.
  5. The Working Group should be terminated.
  6. The Recommendation (assuming it has already been issued) should be rescinded.

Overall, this document demonstrates a clear presumption in favor of patent-unencumbered technologies. However, it does allow an escape hatch when a patent-holder is simply not willing to budge.


Opera Software ASA has posted the first beta of Opera 6.0 for Linux. They've also released localized versions of Opera 6.0.1 for Windows in several additional languages including German, Spanish, Danish, Dutch, Estonian, Finnish, French, Norwegian bokmål and nynorsk, Russian, and Swedish. Opera 6.0 supports direct display of XML in the browser with attached CSS style sheets.


FullXML 2, Beta 3 has been posted. "Fullxml is an instant Web Portal System. The goal of Fullxml is to have an automated web site to distribute news and content. Main features include: web based admin, surveys, access stats page with counter, user customizable box, themes manager, friendly administration GUI with graphic topic manager, option to edit or delete stories, option to delete comments, moderation system, Referers page to know who link us, sections manager, user and authors edit, search engine, and many, many more friendly functions. Fullxml is written 80% in XSL and 20% in ASP and requires IIS or PWS, MSXML 4 component and NO DATABASE. Support for 2 languages (English and French), many themes to choose from, File Manager, categorized articles and a lot more." Version 2 adds many new functions, including skins, page customizations, mail alert, member area, download section, site map, and banners.

Monday, February 25, 2002

Jasc Software (the Paint Shop Pro folks) have released WebDraw 1.0, a $149 payware SVG authoring tool for Windows. The download only edition is $129. An evaluation version is available.

Sunday, February 24, 2002

The W3C Voice Browser Working Group has published the initial working draft of Voice Browser Call Control: CCXML Version 1.0. According to the spec abstract, "CCXML is designed to provide telephony call control support for VoiceXML or other dialog systems. CCXML has been designed to complement and integrate with a VoiceXML system. Because of this you will find many references to VoiceXML's capabilities and limitations. You will also find details on how VoiceXML and CCXML can be integrated. However it should be noted that the two languages are separate and are not required in an implementation of either language. For example CCXML could be integrated with a more traditional IVR system and VoiceXML could be integrated with some other call control system."

Saturday, February 23, 2002

IBM's alphaWorks has updated their P3P Policy Editor with some bug fixes and improved compatibility with the P3P Proposed Recommendation.

Friday, February 22, 2002

The W3C XML Core Working Group has published the Candidate Recommendation of XInclude. The big change in this draft is the additon of an xinclude:fallback element to provide an alternative in the event that a requested resource can't be found; e.g.

<xinclude:include href="remotefile.xml" parse="xml">>
  <xinclude:fallback>Oops! Couldn't find remotefile.xml! </xinclude:fallback>
</xinclude:include>

Previously, the XInclude processor was required to give up and report an error if a resource couldn't be found. (It still is if there's no xinclude:fallback element.) Overall, this strikes me as a good idea, and the syntax seems reasonable. However, I do wish that W3C Working Groups would issue new working drafts before adding such major functionality. Past experience indicates that if I did have a big problem with this, the only response I'd get from the working group would be that it was too late in the process to change it.

Other changes include using IURI references instead of URI references. The difference is that an IURI can use non-ASCII characters like α and é directly without escaping them first. There are also a number of clean-ups in the sepcification text. For instance, the spec acknowledges that attribute types and namespaces may have to be fixed up in the inclusion process and the proper handling of XPointers that select points is specifcally addressed in the spec.

These changes don't seem too hard to integrate into my XInclude processor. I've also recently begun working on making the DOM version a lot more robust so it should work with more DOM implementations.


In an end-run around the W3C XPointer Working Group, Jonathan Borden and Simon St. Laurent have submitted a draft of "A generic fragment identifier syntax for URI references" to the IETF. Like the current version of XPointer it has 3 forms:

  • Bare names such as #foo
  • Child sequences (a.k.a tumblers) such as #/1/4/2
  • Scheme based fragments such as #xpath(/foo/bar[3]) and #xpointer(start-point(/foo/bar[3]))

Borden points out that, "It would be possible to drop the xpointer scheme and this draft becomes a very compact fragment identifier syntax for XML -- as well as being patent unencumbered"


The W3C Internationalization Working Group has published a new working draft of Character Model for the World Wide Web 1.0. The big change since the previous draft is moving from the position that "recipients MUST NOT normalize" to "recipients MUST check and reject un-normalized data".

Thursday, February 21, 2002

The XML Apache Project has posted the second release candidate of FOP 0.20.3. Changes since the previous release candidate include:

  • Proper use of font encodings for "native" fonts
  • jimi.jar removed for license reasons
  • Improved Japanese support in the AWT viewer
  • Polish, Greek, and Hungarian hyphenation
  • Various bug fixes

Java 1.2 or later is required. This release adheres to the final recommendation of XSL 1.0, so you'll need to update your stylesheets if you haven't already.


Norm Walsh has updated his XSL and DSSL stylesheets for DocBook to versions 1.49 and 1.75 respectively. I use the XSLT version to produce the online and printed versions of Processing XML with Java.


The W3C CSS Working Group has published the initial public working draft of CSS3 module: Lists. The main new features relative to CSS2 are additional styles for non-western numbering systems and a :marker new pseudo-element to address list bullets and labels.

Wednesday, February 20, 2002

The W3C CSS Working Group has published new working drafts for three new CSS Level 3 modules:

Backgrounds
"This draft describes the functionality that is proposed for CSS level 3 to describe backgrounds, such as background colors and background images." Newly defined background properties include background-clip, background-origin, background-size, background-quantity, and background-spacing.
Color
Properties include color, color-profile, opacity, and rendering-intent.
Cascading and inheritance
"This CSS3 module describes how values are assigned to properties. CSS allows several style sheets to influence the rendering of a document, and the process of combining these style sheets is called 'cascading'. If no value can be found through cascading, a value can be inherited from the parent element or the property's initial value is used."

Nyk Cowham's ported libxml2 and libxslt to MacOS X 10.1.2.

Tuesday, February 19, 2002

The Unicode consortium and the W3C Internationalization Working Group/Interest Group have published Unicode Technical Report #20, Unicode in XML and other Markup Languages. This discusses which Unicode characters are and are not appropriate for use in XML and similar languages. One thing I note is that this note specifically recommends against the use of Unicode character #x2028 which XML 1.1 proposes to add to the list of legal white space.


Ron bourret's updated his XML database products list with 18 new products and 23 updated descriptions of previously listed products.

Monday, February 18, 2002

Jato is an "XML language for transforming any XML document to/from any set of Java objects." The first release candidate of Jato Beta 4 is available via CVS. The release supports XPath, namespaces, TrAX, SAX, and JDOM beta 7.


Andy Clark's updated NekoHTML to version 0.2.2. This is a Xerces 2 Native Interface-based HTML parser. This should be able to parse most though not all real-world HTML. This release fixes a couple of bugs involving attributes and Xalan-J 2.3 compatibility.

Sunday, February 17, 2002

The W3C Scalable Vector Graphics working group has published Last Call working drafts of Scalable Vector Graphics (SVG) 1.1 Specification and Mobile SVG Profiles: SVG Tiny and SVG Basic. Last Call Ends March 15, 2002.

Saturday, February 16, 2002

Dave Pawson has published An Introduction to XSL Formatting Objects, an online book about XSL-FO with lots of examples.

Friday, February 15, 2002

The W3C/IETF XML Signature Working Group has published the finished Recommendation of XML-Signature Syntax and Processing. This describes a mechanism for digitally signing XML and other documents using public key cryptogrpahy and embedding those signatures in XML documents. I'll be talking about this at the XML 2002 show in London next month in my Advanced XML tutorial.

The same group has also published a proposed recommendation of Exclusive XML Canonicalization Version 1.0. This attempts to rework Canonical XML so that it can be reliably used for subdocuments that may be embedded at a different places in different documents with possibly different namespace bindings, space treatment, language tagging, base URIs, and everything else that might potentially be inherited from the subdocuments ancestor elements.


The Apache XML Project has released version 2.3 of Xalan-J, an open source XSLT engine written in Java. Changes since 2.2 include:

  • Based on Xerces-J 2.0.0 instead of 1.4.4
  • Various bug fixes and performance improvements
  • The Xalan-J 1.x backwards compatibility APIs have been removed.
  • XSLTC implementation has a new version of BCEL with a new bin/regexp.jar.

From the release notes, it does not appear that this release fixes the major problems with TRAX identity transforms I uncovered while working on Chapter 10 of Processing XML with Java. However, I really need to test that out. Update: I've tried it, and it's still broken.


The The DOM Test Suite Group has released the first version of the DOM Conformance Test Suite, Level 1 Core. The tests are written in XML. XSLT stylesheets are used to produce java and ECMAScript bindings for the tests.


The NetBeans XML team has released the finished version of the NetBeans XML module, an open source XML editor for NetBeans 3.3.1 IOntegrated Development Environment (IDE). It's available through the NetBeans AutoUpdate center.Features include:

  • XML text editor with coloring and completion
  • XML tree editor
  • XML entity catalogs support
  • DTD documentator
  • Java code generators
Friday, February 14, 2003

James Strachan's released dom4j 1.2. dom4j is an open source library for working with XML, XPath and XSLT in Java. This release adds "more optimal whitespace handling", a new Swing TableModel, and various bug fixes.

Wednesday, February 13, 2002

IBM's alphaWorks has updated the XML Schema Quality Checker to version 2.0. This release can validate schemas embedded in other documents (such as WSDL), improves checking, and can be invoked via a WSAD/Eclipse plug-in.


The W3C Quality Assurance Activity has posted two new working drafts describing a Quality Assurance (QA) framework, QA Framework: Introduction and QA Framework: Process & Operational Guidelines. Two more are planned covering QA Framework: Specification Guidelines, guidelines for writing better, more testable technical reports and QA Framework: Test Materials Guidelines, detailed guidelines for test materials, such as test suites and test tools. According to the Introduction document:

The documents in this family collectively aim to provide the W3C Working Groups with resources and tools for all phases and aspects of their quality and conformance activities,

  • planning and process setup,
  • writing better, more testable specifications,
  • coordination with internal and external groups,
  • and ultimately to building or acquiring conformance test materials.
Tuesday, February 12, 2002

The Apache XML Project has released XML Security v1.0.0 (Web site not yet updated). This is an implementation of security related XML standards including Canonical XML, and XML Signature Syntax and Processing. A compatible Java Cryptography Extension provider is required.

I was playing with this about a week ago when the version number was 0.0.1, and I have to say that unless a lot of work has happened in the last week, that's a much more accurate version number than 1.0.0. XML Security is incompatible with Java 1.4 and all versions of Xalan earlier than 2.2.0. (Java 1.4 bundles Xalan-J 2.2d10.) After much effort and with some assistance from the developers I was able to get this to compile. However, I never got it to run successfully and produce a digital signature or an encrypted document.

The dependence on the JCE is a large part of the usability problems in XML Security. Sun's efforts to comply with various laws regarding encryption have made the JCE so complex as to be virtually unusable in practice. I think the time has come for open source developers in countries not restricted by U.S. law to replace the JCE entirely, and deliberately design a crypto API from the ground up that is not compatible with the JCE and thus not encumbered by the JCE's baroque design.


DecisionSoft's released Pathan, an open source XPath module that implements the recent DOM Level 3 XPath Working Draft and works with Xerces-C and Xerces-P.


Michael Kay's released version 6.5.1 of Saxon, an open source XSLT processor written in Java. Version 6.5.1 "is a maintenance release that fixes about twenty known errors in Saxon 6.5."


eCube has released version 1.1 of catchXSL, an XSLT profiler. This release adds profiling output as XML and explicit output of template run-times.


Oracle's released version 9.2.0.1 of their XML Developer Kits for Java, JavaBeans, C++, C, and PL/SQL. This release fixes bugs and makes a few small changes and additions. The Java version adds support for JAXP 1.1. Registration is required to download this.


Altova's released version 4.3 of XML Spy Suite, an XML editor for Windows. This release adds a SOAP Debugger, an XPath Analyzer, and support for Microsoft SQL Server 2000 SQLXML an XMLSpy Suite is $399 payware. Upgrades from version 4.2 are free.


Opera Software ASA has released version 6.0.1 of Opera for Windows, a web browser that supports XML styled with CSS. Mostly this is a bug fix release. However, for the first tim, Opera is available in Japanese. Opera is $39 payware or free-beer adware (Your choice).

Monday, February 11, 2002

I've posted Output from DOM, Chapter 13 of Processing XML with Java. This chapter answers the question of how, once you've built a DOM document in memory, to you stuff it back into a text file. The answers are distressingly implementation dependent, but I tried to stick to the most broadly useful techniques. This chapter includes a lot of bleeding edge material from DOM Level 3 and Xerces-2.

This is the last major chapter I have planned on DOM. Chapters 9 through 13 cover pretty much the entire range of the Document Object Model Level 2, with more than a few excursions into DOM3. Possibly, I'll add a chapter on abstract schemas at a later point if the implementations gel enough by the time I have to hand over the finished book to the publisher, and I'll probably discuss DOM3 XPath in Chapter 16. However, right now these five chapters pretty effectively cover the state of the art in DOM programming. The next two chapters are scheduled to cover JDOM, but I'm probably going to go out of order and write chapters 16 and 17 next. These cover XPath and XSLT APIs .


Speaking of bleeding edge features in Xerces 2.0, Andy Clark's written NekoHTML, HTML DOM parser based the Xerces Native Interface, a SAX-like event driven interface for parsing XML and now HTML. It's currently only available under an "experimental" license; i.e. you can play with it but not use it for real work.


IBM's alphaWorks has posted version 3.0.1 of their Web Services Toolkit, a "software development kit that includes a run-time environment, a demo, and examples to aid in designing and executing Web service applications that can automatically find one another and collaborate in business transactions without additional programming or human intervention. Simple examples of Web services are provided, as well as demonstrations of how some of the emerging technology standards, such as SOAP, UDDI, and WSDL, work together." Version 3.0.1 just makes some corrections in the fixes a bug in Linux installation. Java 1.3 or later is required.

Sunday, February 10, 2002

Happy birthday XML! Tim Bray reminded me that today, February 10, is the fourth anniversary of the official publication of the XML 1.0 Recommendation, 1st edition. What a long strange trip it's been.

Peter Murray-Rust and Henry Rzepa have announced STM-ML - an XML application for scientific, technical and medical publishing. It focuses on the constructs that occur frequently across disciplines including:

  • Numeric data, with scientific units
  • Regular structures of homogeneous data types (arrays, matrixes, tables)
  • Containers for metadata
  • Dictionaries for scientific terms, including dictionary-driven constraints (datatypes, values, enumerations, etc.)
  • Abstract objects for scientific discourse (object, action, observation, etc.)

Fittingly, for this anniversary, this grows out of work they originally did on the Chemical Markup Language, perhaps the first public XML application.

Friday, February 8, 2002

The W3C DOM Activity has published new working drafts of the Document Object Model (DOM) Level 3 XPath Specification and the Document Object Model (DOM) Level 3 Events Specification. The XPath specification defines a basic API for using XPath expressions like /song/composer/first_name to select nodes from a DOM Document object. The events specification "defines the Document Object Model Events Level 3, a platform- and language-neutral interface that allows programs and scripts to dynamically access and update the content, structure and style of documents. The Document Object Model Events Level 3 builds on the Document Object Model Events Level 2." DOM3 events adds EventListenerList, EventGroup, EventTargetGroup, DocumentEventGroup, and TextEvent interfaces.

Thursday, February 7, 2002

Tim Bray's posted a "thought experiment" for XML Skunk Works, a new version of XML that's mostly compatible with exisitng XML parsers and practice. It does this by subsetting XML to remove internal DTD subsets (with external DTD subsets strongly deprecated). This has the effect of removing entities as well. It adds namespaces, xml:base, and the infoset into the core specification. The resulting specification is much cleaner and easier to copmprehend than the originals. It focuses almost entirely on well-formedness, with validity left to other specs like schemas. If differs from XML 1.1 in that the XML 1.0 character rules are maintained, although this will probably be changed in a future draft. It differs from efforts like Minimal XML in that the spec does not simplify XML for parser writers at the cost of complexifying it for document authors (e.g. CDATA sections and empty-element tags are retained).

I'm not sure what I think of this yet. It looks promising, especially since it seems possible that the set of all SkunkWorks documents might be a proper subset of the set of all XML 1.0 documents. This would allow existing parsers to work unchanged on SkunkWorks documents, something that is decidedly not true of XML 1.1. That would be a very good thing. At the same time, there are some signifcant costs to this proposal. For instance, it would no longer be possible to define character entities like &copy; and &nbsp; in XML documents. That's a major loss. It will be interesting to see where this goes.


Sun's posted the Winter 01 Update Release of the Java XML Pack. This bundles:

  • Java API for XML Messaging (JAXM) v1.0.1 EA1
  • Java API for XML Processing (JAXP) v1.2 EA1
  • Java API for XML Registries (JAXR) v1.0 EA
  • Java API for XML-based RPC (JAX-RPC) v1.0 EA1

Sun's also posted the second public review draft specification of the Java API for XML-based RPC.

Wednesday, February 6, 2002

Norm Walsh has posted the first beta of DiffMk 2.0, a tool for generating changebars automatically for DocBook (and XHTML and XML Spec) documents. Version 2.0 has been rewritten in Java, adds a GUI, and supports word-level diffing.


IBM's alphaWorks has updated the XSL Formatting Objects Composer, "a typesetting and display engine that implements a substantial portion of XSL Formatting Objects", to correctly resolve file names read from the command-line, add property expression evaluation, and reduce the size of the generated PDF files.

Tuesday, February 5, 2002

Version 0.9.8 of Mozilla has been released for the usual batch of platforms (MacOS, Linux, Windows, and OS/2) New features in this release include:

  • Hebrew is now supported on Solaris, and Hebrew and Arabic supported on MacOS systems that have the proper language pack installed.
  • The classic theme now has native looking widgets on MacOS X and Windows XP.
  • Many functionality fixes in the address book
  • Mozilla supports MNG again.
  • Dynamic theme switching mostly works
  • Mozilla no longer reads /favicon.ico images by default although Mozilla still reads page icons defined with the <link> tag.
  • Files automatically downloaded by clicking on a link are now renamed to the correct name when the download finishes.
  • DOM Inspector is included on MacOS builds.
  • A new Page Setup dialog allows setting margins, page orientation and page scaling.
  • Composer is now able to generate CSS inline styles instead of deprecated HTML elements and/or attributes
Monday, February 4, 2002

John Cowan's begun work on Architectural Forms: A New Generation (current draft version 2.1). From the draft:

AF:NG provides the facilities, but does not employ the syntax, of SGML Architectural Forms. AF:NG is intended to be used in conjunction with the schema language RELAX NG, but is not dependent on it in any way.

The purpose of AF:NG is to provide for tightly specified transformations of XML documents, consisting of renaming or omitting elements, attributes, and character data. AF:NG is not intended as a general-purpose transformation language like XSLT or Omnimark. Using AF:NG, a recipient may, instead of specifying a schema to which documents must conform exactly, specify a schema to be applied to the output of an AF:NG transformation. In that way, the actual element and attribute names, and to some degree the document structure, may vary from the schema without rendering the document unacceptable. In particular, it is easy to use AF:NG to reduce a complex document to a much simpler one, when only a subset of the document is of interest to the recipient.


FOA 0.2.0 is an open source Java application that provides a graphical user interface for authoring XSL Formatting Objects documents. It sits on top of a customized version of the XML Apache Project's FOP.


The Apache Cocoon team has released version 2.0.1 of the Cocoon XML publishing framework. Cocoon interacts with most data sources, including files, relational databases, native XML databases, and network-based data sources. It adapts content delivery to the capabilities of different devices such as HTML, WML, PDF, SVG, and RTF.

Sunday, February 3, 2002

Norman Walsh has released version 1.1 of his XML Entity and URI Resolvers Java classes that support using Catalog files to perform entity resolution. This implements the OASIS XML Catalogs specification as well as OASIS TR9401 Catalogs. Version 1.1 is now open source (Apache license).

Friday, February 1, 2002

The W3C Synchronized Multimedia Working Group has published a note containing an XHTML+SMIL Profile. According to the abstract, this "defines a set of XHTML abstract modules that support a subset of the SMIL 2.0 specification.  It includes functionality from SMIL 2.0 modules providing support for animation, content control, media objects, timing and synchronization, and transition effects. The profile also integrates SMIL 2.0 features directly with XHTML and CSS, describing how SMIL can be used to manipulate XHTML and CSS features. Additional semantics are defined for some XHTML elements and CSS properties."

Thursday, January 31, 2002

David Brownell's released SAX 2.0.1. This is a bugfix release, primarily focussed on improving incomplete specifications of the methods in the JavaDoc. The core API has not changed at all aside from adding a few missing exception constructors and one extra exception declaration in DefaultHandler. Changes include:

  • XMLReaderFactory uses the right class loader on JDK 1.2 (and later); same for (SAX 1.0) ParserFactory
  • XMLReaderFactory can also be configured using a system resource (META-INF/services)
  • NamespaceSupport now enforces declare-before-use
  • ParserAdapter won't use-before-declare and doesn't depend on JDK 1.2
  • Various bug fixes in AttributesImpl
  • SAX 2.0 Extensions 1.0 ( DeclHandler and LexicalHandler) is now bundled but is still optional from a conformance perspective.

I tried out Xerces 2.0.0 yesterday, and so far I'm not too happy with it. It definitely breaks my DOM XInclude processor (which was also broken with Xerces 1.4.4, but worked with Xerces 1.4.3.) The DOM Level 3 experimental material is all in the wrong packages, and requires using Xerces-specific classes. Thus you can use these classes, but you can't construct them in the way the DOM3 specs intend. The API documentation is divided into multiple sets for each API (i.e. one for DOM3, one for XNI, one for implementation classes, etc.) which makes browsing the documentation a lot more difficult. For the moment, I've reverted to Xerces-J 1.4.3.

Wednesday, January 30, 2002

The XML Apache Project has released Xerces-J 2.0.0, an open source, schema-validating XML parser written in Java. The internals of Xerces-J have been rewritten from the ground up to be much cleaner, much more extensible, and much more comprehensible. Xerces 2 introduces the Xerces Native Interface (XNI). XNI is a streaming API, similar in design to SAX. However, it attempts to provide more information than SAX2 does including the XML declaration, CDATA sections, parameter entities, text declarations, conditional sections in DTDs, and various post-schema-validation-infoset (PSVI) information. Of course Xerces also supports DOM1, DOM2, SAX1, SAX 2, and JAXP 1.1. There's some experimental support for DOM Level 3 including the Core, abstract schemas, and load and save modules.

Programs that only use the standard APIs to access Xerces-1 should continue to run with Xerces-2 without even a recompile. However, the parser is supposed to be faster and use less memory, especially when it comes to schemas. Doubtless a few bugs and inconsistencies will surface as this makes its way out into the broader community, but past exxperience with Xerces suggests these will be fixed reasonably quickly.

I intend to upgrade immediately because I need to explore some of the new features for Processing XML with Java, as well as my upcoming Bleeding Edge of XML tutorial at XML and Web Services 2002 in London in March. A large portion of that session is going to focus on the new DOM3 features in Xerces. However, I recommend that most users stick with Xerces-J 1.4.3 (1.4.4 was too buggy to recommend.) for a few more point releases while the bugs get shaken out and the APIs stabilize. DOM3 and XNI are still subject to change.

Tuesday, January 29, 2002

The W3C P3P Specification Working Group has posted the Proposed Recommendation of The Platform for Privacy Preferences 1.0 (P3P1.0) Specification. P3P enables Web sites to express their privacy practices in a standard XML format that can be retrieved automatically and interpreted by browsers. P3P aware browsers "allow users to be informed of site practices (in both machine- and human-readable formats) and to automate decision-making based on these practices when appropriate. Thus users need not read the privacy policies at every site they visit.

"Although P3P provides a technical mechanism for ensuring that users can be informed about privacy policies before they release personal information, it does not provide a technical mechanism for making sure sites act according to their policies. Products implementing this specification MAY provide some assistance in that regard, but that is up to specific implementations and outside the scope of this specification. However, P3P is complementary to laws and self-regulatory programs that can provide enforcement mechanisms. In addition, P3P does not include mechanisms for transferring data or for securing personal data in transit or storage. P3P may be built into tools designed to facilitate data transfer. These tools should include appropriate security safeguards."

An RDF Schema for P3P has also been published.


The W3C CSS Working Group has posted the last call working draft of Media Queries. According to the abstract:

HTML4 and CSS2 currently support media-dependent style sheets tailored for different media types. For example, a document may use sans-serif fonts when displayed on a screen and serif fonts when printed. "Screen" and "print" are two of media types that have been defined. Media queries extend the functionality of media types by allowing more precise labeling of style sheets.

A media query consists of a media type and one or more expressions to limit the scope of style sheets. Among the media features that can be used in media queries are "width", "height", and "color". By using media queries, presentations can be tailored to a specific range of output devices without changing the content itself.

Comments are due by February 20.

Monday, January 28, 2002

I'm happy to announce that I've posted "Creating New XML Documents with DOM," Chapter 10 of Processing XML with Java, here on Cafe con Leche. This chapter covers the Document and DOMImplementation interfaces with a particular focus on how you use DOM to create new XML in documents and elements in memory rather than parsing them from a file. The last section discusses some new bleeding edge features from DOM3.

This chapter includes some interesting examples of SOAP and XML-RPC servlets. I've also gone back to the earlier chapters, particularly 2, 3, and 5, and rewritten the SOAP and XML-RPC client examples so that they point at my personal server at www.elharo.com. The firewall is open on port 80 so if you want to test those examples out, feel free. This is just a small Linux box running off my DSL connection so please don't hit it too hard. Also, I've been playing around with the DNS and web servers for both macfaq.com and elharo.com to get them working together on the same box, so if you notice anything funny like hosts not resolving or the wrong site coming up, please let me know. I think it's all working now, but I've said that before. :-)

One warning: the examples in this chapter use JAXP's ID transform for serialization. In the process of writing this chapter, I uncovered some nasty bugs in common implementations of the javax.xml.transform classes, particularly involving the output of namespace declaration attributes. I've reported the bugs to the various parser vendors and I'm hopeful they'll be fixed soon. In the meantime, not all the examples will produce namespace-well-formed output.

Together Chapters 9 through 12 form a really solid introduction to the Document Object Model. In fact, I think Chapters 1 through 12 are a really solid introduction to processing XML with Java and form the core of the book. There are still several chapters and appendixes to go. Nonentheless, I'd certainly feel comfortable using this as the text for a course in XML as stands, perhaps in combination with a more introductory book about XML syntax like the XML Bible.

Sunday, January 27, 2002

Mark Hale's released version 0.877 of JSci, a class library containing many useful mathematical and scientific functions such as complex arithmetic. The major new feature in this release is partial support for the MathML Document Object Model (DOM) in conjunction with the Xerces parser.

Saturday, January 26, 2002

Version 1.3.1 of the OpenJade DSSSL processor has been released. DSSSL is a style language for SGML and XML documents. OpenJade contains backends for various formats (RTF, HTML, TeX, MIF, SGML2SGML, and FOT). This is a maintenance release that "pulls together the various patches and fixes that have been made and tested since the previous release of OpenJade, back in 1999. In addition, the new release supports new platforms and the latest GNU and Microsoft compilers". New features include:

  • Support for new platforms and architectures including MacOS X / Darwin and Cygwin, Intel IA64 under Linux (and S/390, PPC, Sparc, Alpha etc).
  • Support for GCC 2.95.3, Red Hat GCC 2.96, and GCC 3.0.
  • Support for Microsoft Visual C++ 6.0
  • Upgraded GNU source configuration tools (autoconf, configure script) to make openjade more portable and to reach more environments.
  • Improvements to the TeX backend including enhanced table support and working double sided output support
  • UNIX on-line manual (man) pages for the various tools in the package

Version 3.12 of JadeTeX has been released. JadeTeX is a set Of TeX macros implementing the TeX output from the Jade/OpenJade DSSSL processor. It requires a reasonably up to date LaTeX installation. JadeTeX can produce PDF and PostScript (via DVI) documents. This release fixes some bugs and adds support for the Euro.

Friday, January 25, 2002

The W3C has published a note on Current Patent Practice. According to the abstract:

This current practice has evolved in order to satisfy the goal held by a number of W3C Members and significant parts of the larger Web community: that W3C Recommendations should be, as far as possible, implementable on a Royalty-Free basis [AC]. The current practice described here seeks to

  • establish Royalty-Free implementation as a goal for Recommendations produced by new and re-chartered Working Groups;
  • encourage maximum disclosure of patents that might prevent a W3C Recommendation from being implemented on a Royalty-Free basis;
  • provide a process for addressing situations in which the goal of Royalty-Free implementation may not be attainable.

This document relies on the definition of Royalty-Free licensing as described in the W3C Patent Policy Framework Last Call Working Draft [PATENT-POLICY]. Note that current W3C patent practice does not require any W3C Member to make a Royalty-Free licensing commitment for essential patents it may hold. Such a commitment is under discussion in the Patent Policy Working Group for possible inclusion in of the final patent policy, but has not been implemented.

This seems to be heading in the right direction, though I think that member commitments to license essential patents is stil necessary.


Simon St. Laurent's submitted a proposal to the IETF for Registration of xmlns Media Feature Tag. This proposes to a new xmlns meadia feature for use in the Content-features MIME header. This could list the namespaces used in an XML document. For example,

Content-features: (|
  (xmlns="http://www.w3.org/1999/xhtml")
  (xmlns="http://www.w3.org/2000/svg")
  (xmlns="http://www.w3.org/1998/Math/MathML")
  (xmlns="http://www.w3.org/2001/SMIL20/")
  (xmlns="http://www.w3.org/1999/xlink")
)

Discussion is taking place on ietf-xml-mime.


Ronald Bourret's posted the first alpha of XML-DBMS version 2.0, open source middleware in both Java and Perl for transferring data between XML documents and a relational database. XML-DBMS uses an object-relational mapping in which complex element types are viewed as classes and simple element types, attributes, and PCDATA are viewed as properties. An XML-based mapping language is used to specify the mapping for a given class of documents. Version 2.0 is an "almost a complete rewrite of version 1.01 and includes the following new features: updates (including insert-or-update), a filter language, heterogeneous joins, inlining elements (denormalizing XML structure), support for database-generated keys, connection and statement pooling, support for abstract types, and enhanced formatting capabilities."

Thursday, January 24, 2002

eCube has released, catchXSL, a free-beer XSLT-profiler written in Java.

Wednesday, January 23, 2002

I've posted The DOM Traversal Module, Chapter 12 of Processing XML with Java. This chapter covers the org.w3c.dom.traversal package including NodeIterator, NodeFilter, TreeWalker and DocumentTraversal. These classes aren't supported by all DOM implementations, but they're very convenient in those implementations that do support them. This is a shorter chapter than usual, only three main sections and about 15 printed pages. As usual all comments are appreciated.

Tuesday, January 22, 2002

The Apache XML Project has posted a release candidate of FOP 0.20.3, the popular open source XSL-FO-to-PDF converter. This release now works with XSL-FO documents that use the final recommendation syntax. That is, it expects to see a master-reference attribute instead of master-name. This is still far from a complete implementation of XSL-FO, but at least it should mostly work with most XSL 1.0 stylesheets.


IBM's alphaWorks has posted iSeries (AS400) binary downloads of the XML for C++ parser.

Monday, January 21, 2002

The W3C HTML Activity has posted the Last Call Working Draft of XForms 1.0. XForms are new XML application that succeeds HTML forms. Comments are due by February 22.

Sunday, January 20, 2002

Simon St.Laurent posted version 0.4 of Gorille, an open source Java library for testing the content and names of XML structures in their XML documents to see if they match lists of acceptable Unicode characters. Developers can create customized rules or rely on the rule files and classes provided for XML 1.0 and 1.1. Version 0.4 adds a code-generator for building compilable rules classes.

Saturday, January 19, 2002

Jochen Wiedmann's posted the first public release of JaxMe, "yet another open source Java/XML binding tool in the style of Castor or Zeus." It's based on a reduced subset of the W3C XML schema language that does not support choices, references, or recursion. It does support JDBC mapping to an SQL table and reading from joined tables.

The Breeze Factor has released version 3.0 of their XML Binder for Java, another one of numerous XML/Java Data-Binding Solutions. This one's $395 payware. New features since its previous incarnation as Breeze XML Studio 2.2 include:

  • Binding XML Schema to Java
  • Namespace support
  • DOM interoperability
  • Speed-ups
  • More Code-Generation Control
  • Automated compilation
  • Limited JAXB compatibility

My problem with these sorts of tools, of which JaxMe seems to be a fairly typical example, is that they start with the question of "How do I get an object/database record into an XML document?" rather than the much tougher question of "How do I get an XML document into an object?" This is a toothpaste problem: it's a lot easier to squirt XML out of a Java object than to push it back in. Most of these tools claim to be able to read XML documents into Java, but fail very quickly as soon as you start throwing real-world documents at them. Generally speaking the developers designing these tools are laboring under numerous faulty assumptions, including:

  • Documents have W3C XML Schema language schemas. (The vast majority don't.)

  • Documents have some kind of schema. (Many, perhaps most, don't.)

  • Documents that actually have schemas of some kind do in fact adhere to those schemas. (Often untrue)

  • You know the sorts of structures you're going to encounter before you see the documents. In other words, the docuemnts are predictable. (Not an unreasonable assumption, but nonetheless often untrue in practice.)

  • Mixed content doesn't exist. (Patently false)

  • XML documents are fairly flat. In particular they have nearly tabular structures. (The database mapping folks tend to make this assumption. The object folks are a little less likely to fall into this particular trap.)

The fact is, XML documents considered in their full generality are extremely complicated. They are not tables. They are not objects. Any reasonable model for them has to take this complexity into account. You can certainly design mappings for XML to Java, but unless you're working in a very restricted domain, its questionable whether you can up with anything much simpler than JDOM. And if you are working in a restricted domain, then all you really need is a standard way of serializing and deserializing particular Java classes to and from a particular XML format. This can be almost hidden from the client programmer. For Java programmers who just want to serialize and deserialize I think tools like JaxMe are too complex. For those of use who need to work with XML as XML, it's into complex enough. XML binding tools like this fall right in the sour spot between simple and powerful, not easy enough to be simple, not sophisticated enough to be powerful.

Friday, January 18, 2002

Tom Bradford's Project Labrador "is an open source modular XML object request broker. It does not attempt to provide full-featured toolkit-style implementations of protocols that it supports, but attempts to be a simple, and easily configured XML Object broker that allows multiple protocol handlers, instance handlers, and service models." Labrador is written in Java.


The NetBeans XML team has posted the second release candidate of the NetBeans XML Project. NetBeans is a modular integrated development environment (IDE) written in Java. These modules add support for working with XML, DTD and CSS documents into the NetBeans IDE. Features include:

  • Text editor with syntax coloring and code completion,
  • Visual tree editor with customizable filtering views
  • Well-formedness and validity checking
  • Generators of XML, DTD, CSS, XHTML and Java outputs
  • Entity catalogs management

This is a bug fix release. NetBeans 3.3 or later is required.

Thursday, January 17, 2002

The W3C XML Schema Working Group has released a new version of the XML Schema Test Collection. This release features over over 10,000 divided between the Structures and Datatypes specs.

Wednesday, January 16, 2002

The Apache XML Project has released version 2.2 of Xalan-J, an open source XSLT processor written in Java. Version 2.2 reimplements the internals of Xalan using the "Document Table Model". It also reorganizes the JAR files so that xalan.jar just includes the Xalan-Java implementation. SAX, DOM, and JAXP are in a separate xml-apis.jar so they can evolve independently and more easily be shared among different Apache projects. However, XSLT-wise there don't appear to be any significant new features in this release.

Tuesday, January 15, 2002

Steve Ball has posted a "theta release" of TclDOM 2.0, a A Tcl Language Binding for the Document Object Model. TclDOM 2.0 supports DOM Level 1 and part of DOM Level 2.

Monday, January 14, 2002

I've posted The Document Object Model Core, Chapter 11 of Processing XML with Java. This chapter covers most of the interfaces in the org.w3c.dom package including Element, Attr, DocumentType, Comment, Text, ProcessingInstruction, CDATASection, Character, Notation, Entity, and EntityReference. As usual all comments are appreciated.

Chapter 10, Creating XML Documents with DOM, is not yet finished. Some of the examples exposed bugs in various third party software that needs to be fixed. I've reported the bugs, and at least one vendor should fix the problem soon. I also need to fix some problems on my Sun Cobalt Qube at http://www.elharo.com to test some of the other examples. (The last kernel update apparently killed the administration interface.) In the meantime. there's probably enough content in Chapter 10 for you to learn what you need to know and continue on to Chapter 11. Just be warned that it's a lot rougher than what I normally announce and not all the examples may work quite as advertised. I wouldn't bother reporting any errors you spot in this chapter. Chances are I'm aware of them already and will be fixing them as quickly as possible.


The W3C DOM Working Group has posted new working drafts of the Document Object Model (DOM) Level 3 Core Specification and the Document Object Model (DOM) Level 3 Abstract Schemas and Load and Save Specification.

One interesting new development in this draft is feature normalization in the Document interface. Four new methods will suport the setting and gettting of normalization features in an implementation independent way:

public void normalizeDocument();
public boolean canSetNormalizationFeature(String name, boolean state);
public void setNormalizationFeature(String name, boolean state) throws DOMException;
public boolean getNormalizationFeature(String name) throws DOMException;

Standard normalization features (which not all implementations may support) include:

  • normalize-characters: Perform the W3C Text Normalization of the characters in the document.
  • split-cdata-sections: Split CDATA sections containing the CDATA section termination marker ]]> and issue a warning.
  • expand-entity-references: expand EntityReference nodes
  • whitespace-in-element-content: Keep or discard all white space in element content in the document.
  • discard-default-content: Use whatever information available to the implementation (i.e. XML schema, DTD, the specified flag on Attr nodes, etc.) to decide what attributes and content should be discarded or not.
  • format-canonical: Canonicalize the document according to the rules specified by Canonical XML
  • format-pretty-print: Format the document by adding whitespace to produce a pretty-printed, indented, human-readable form.
  • namespace-declarations: Include or discard the namespace declaration attributes, specified or defaulted from the schema or the DTD, in the document.
  • validation: Use the abstract schema to validate the document as it is being normalized. If validation errors are found the error handler is notified.
  • external-parameter-entities: Load external parameter entities.
  • external-general-entities: Include all external general (text) entities.
  • external-dtd-subset: Load the external DTD subset and also all external parameter entities.
  • validate-if-schema: Validate only if the document being processed has some kind of schema (e.g. W3C XML Schema Language schema, DTD, Relax-NG, etc.)
  • validate-against-dtd: Prefer validation against the DTD over any other schema used with the document.
  • datatype-normalization: Let the (non-DTD) validation process do its datatype normalization that is defined in the schema language.
  • create-entity-ref-nodes: Create EntityReference nodes in the document.
  • create-entity-nodes: Create Entity nodes in the document.
  • create-cdata-nodes: Keep CDATASection nodes the document.
  • comments: Keep Comment nodes in the document.
  • load-as-infoset: Only keep in the document the information defined in the XML Information Set
Sunday, January 13, 2002

XMLmind has released xsdvalid, three command line tools for validating documents against W3C XML schema language schemas, validating documents against DTDs, and converting DTDs to schemas. Java 1.3 or later is required.

Saturday, January 12, 2002

The W3C Scalable Vector Graphics Working Group has published the second working draft of SVG 1.1, an XML application for vector graphics. New features in this draft include a W3C XML Schema for SVG, text wrapping, and geographic coordinate systems.

The SVG Working Group has decided against allowing graphical elements to be described using viewport coordinates in SVG 1.1 so that, for example, toolbars or map legends could be statically placed within the user agent viewport and be unaffected by zoom and pan. This will be revisited for SVG 2.0. Some other necessary features aren't even mentioned. I have to say that SVG strikes me as adequate for graphic arts, but completely unsuitable for technical drawings like CAD diagrams. An SVG document describes the picture that's going to be put on paper or screen rather than the real-world object the picture represents. E.g. an SVG document represents a picture of a house rather than the house itself.

The W3C SVG Working Group has also posted the first public working draft of Mobile SVG Profiles: SVG Tiny and SVG Basic. This describes two subsets of SVG for use in small devices like cell phones and Palm Pilots.

On a positive note, Apple's announced a royalty-free license for the patent they named early in the SVG 1.0 process. However, the W3C still won't let non-members read the license for this or other patent disclosures. Furthermore, contrary to popular royalty-free is not the same as free-beer. Apple could offer a royalty-free license that cost several million dollars, provided only that the price was not based on the number of copies shipped. And of course there are still several claims from companies unwilling to offer even royalty-free licenses, inclusing IBM, Quark and Kodak.

Friday, January 11, 2002

The NetBeans XML team has posted the first release candidate of the NetBeans XML modules family. NetBeans is a modular Integrated Development Environment (IDE) written in Java. Features include:

  • A text editor with syntax coloring and code completion,
  • A tree editor with customizable filtering views
  • A well-formedness and validity checker

This is primarily a bug fix release. NetBeans 3.3 is required.

Thursday, January 10, 2002

Opera Software has released Opera 5.0 for the Macintosh. This browser supports direct display of XML with attached CSS style sheets. Opera is a PowerPC-only classic app that requires MacOS 7.5.3 or later. (A MacOS X beta is also available.) Pricing is the same as for the Windows version, free-beer with ads or $39 without ads.


Wolfgang Meier of the Darmstadt University of Technology has posted version 0.7 of the of eXist open source XML database. eXist is a native XML database that supports fulltext search. XML can be stored in either the internal, native XML-DB or an external relational database. The search engine has been designed to provide fast XPath queries, using indexes for all element, text and attribute nodes. The server is accessible through HTTP and XML-RPC interfaces and supports the XML:DB API for Java programming. Version 0.7 significantly improves stability and performance. The native storage backend is now secure for concurrent write access by multiple clients. New features include experimental support for XSL on the server.

Wednesday, January 9, 2002

Jochen Wiedmann's released JaxMe 1.0.3, yet another open source Java/XML binding tool based on SAX and schemas. This version adds a generator for entity beans with BMP.


John Cowan's updated the Itsy Bitsy Teeny Weeny Simple Hypertext (IBTWSH) DTD to version 6.1. IBTWSH is a subset of XHTML Basic for embedded use within other XML applications. This release adds a meta element, and fixes an error in the ATTLIST declaration for the ul element.


The NetBeans XML team has posted the second Beta release of the NetBeans XML modules family. NetBeans is a modular Integrated Development Environment (IDE) written in Java. Features include:

  • A text editor with syntax coloring and code completion,
  • A tree editor with customizable filtering views
  • A well-formedness and validity checker

This is primarily a bug fix release. NetBeans 3.3 is required.

Tuesday, January 8, 2002

alphaXML Ltd is running an XML Limerick comptetion:

Prizes from Altova and Wrox
For rhymes about XPath and SOX
All you must do
Is rhyme like I do
And wait for your seasonal box

Prizes include Altova's XML Spy Suite (courtesy of Altova) and Wrox Press's Professional ebXML Foundations.


Norm Walsh has updated several of his DocBook applications including:


Tenuto is an open source RelaxNG validator. written in C# that runs on .NET Framework (Beta2).

Monday, January 7, 2002

Version 1.1 of the XSLT Standard Library has been released. This is a collection of commonly-used templates written purely in XSLT. I use the date templates from this library in the stylesheets for Processing XML with Java. Version 1.1 adds:

  • New internationalization functionality in the string module.
  • The markup module for generating XML text.
  • Various bug fixes and patches

xsltsl is open source published under the LGPL.

Sunday, January 6, 2002

Daisuke Okajima has updated RelaxNGCC, a tool for generating Java source code from a given RelaxNG grammar. New features in this release include:

  • Changed RelaxNG namespace URI to version 1.0.
  • Modified the procedure of interleave correctly.
  • Support for name-classes.
  • Changed non-XML syntax features for James Clark's new annotation support
.

Heinz Detlev Koch & Roman Halstenberg, electronic publishing consulting GbR has released version 1.1 of epcEdit, a $349 payware XML and SGML editor for Windows, Linux, and Solaris. New features include spell checking, support for ignorable white space, improved attribute editing, and more. Upgrades are free for registered users of epcEdit 1.0.


IBM's alphaWorks has released Web Services Toolkit 3.0. This provides the "basic software components needed to create a Web services environment " including:

  • Utility services for common infrastructure including metering, accounting, contract, common data, notification, and identity services.
  • A WSDLdoc utility (like Javadoc) that parses a WSDL document and produces HTML documentation that describes the Web service operations
  • AXIS alpha3 (a.k.a. Apache SOAP 3.0)
  • SoapConnect for Lotus Script, a partial implementation of SOAP 1.1 for LotusScript.
  • Implementation of the X-KRSS (Key Registration Service Specification) part of the XKMS (XML Key Management) specification.
  • UDDI4J Version 2 preview: WSTK has included a technical preview of the UDDI4J Version 2 code. This client-side Java API communicates with the recently announced UDDI Version 2 registries from IBM, Microsoft, HP, and SAP. UDDI4J Version 2 conforms to the UDDI Version 2 specification.
  • HTTPR (Reliable HTTP) technology update that supports arbitrary byte streams and asynchronous client interfaces via the Java message Service (JMS) and the MQSeries API usages. "HTTPR ensures that a message gets delivered over the Internet to its destination application only once or that it gets reported as undeliverable."
  • The Web Services Stack (WSS), a lightweight application server based on the WebSphere 4.01 code base.
  • A lightweight UDDI implementation.
  • WS-Inspection (Web Services Inspection), an XML application that "defines a process for service discovery by aggregating existing Web service description documents within the confines of a standard Web server."

Java 1.3 or later is required.


Peter Flynn's posted version 2.1 of the XML FAQ. Changes in this version include a new question on root elements, more references for XML and databases, a pointer to the Namespaces FAQ, corrections of some misunderstandings about character encodings, updated the XLink to W3C Recommendation, and translations into German and Amharic.

Saturday, January 5, 2002

IBM's alphaWorks has released XML for C++ 4.0, an XML parser written in fairly generic C++. Version 4.0 adds "complete support for W3C Schema Recommendation 1.0" and various other additions, improvements, and bug fixes.

Friday, January 4, 2002

Vancouver-based UFIL Unified Data Technologies is claiming that it owns a patent that covers the Resource Description Framework. It was awarded U.S. patent 5,684,985, a "Method and apparatus utilizing bond identifiers executed upon accessing of an endo-dynamic information node" in November 1997. My initial take is that the patent does not cover RDF, because it does not describe the containment relationship that is at the heart of all XML documents. In fact, the patent specifically distinguishes itself from other methods by avoiding containment:

logical organization exists when the information atoms are associated in different ways to produce a structure. In current information organization methodologies, the only kind of association between two atoms of information is containment. This is true no matter how evolved the containment method may be. For example, in object-oriented programming methods, objects can have associated predefined processes, where this is accomplished by the object containing the processes or references to those processes.

However, RDF is a very weird beast in the family of XML specs, so I'm not at all sure of this.

It's not clear if UFIL actually does anything except file patents and attempt to extort royalties. Their web site (www.udtl.com) is offline. The whois record points to what sounds like a mailbox store. They've brought in Patent Enforcement and Royalties Limited, a public company that specializes in licensing patents to sue other companies, to pursue litigation against companies that are claimed to be infringing this patent. UFIL is not a member of the W3C so any W3C patent policy regarding what rights members must grant to their patents is irrelevant here.

Thursday, January 3, 2002

The XML Apache Project has posted the fourth beta of Xerces-J 2.0, a ground-up rewrite of the popular open source XML parser for Java. This may be the "last beta release of Xerces-J 2.0 before a stable version becomes available. This release fixes a number of bugs, introduces more changes to the Xerces Native Interface, provides partial experimental DOM Level 3 implementation, and includes full XML Schema support."


Anthony B. Coates's xmLP 1.0 is an open source (LGPL) literate programming tool for XML, written as a set of XSLT scripts.

Wednesday, January 2, 2002

V Lakshman's open source streamdom is a Java class that converts SAX events into events for one or more DOM Element handlers provided. The event-handling callbacks get to process entire elements complete with children and attributes but not siblings. Thus, your code can process streaming data using DOM with a small memory imprint.

Tuesday, January 1, 2002

Gnumeric 1.0 has been released. Gnumeric is a graphical spreadsheet for Linux/Gnome that uses XML as its native storage format.


News from 2001 | News from 2000 | News from 1998 | News from 1999
[ XML Books | XML Trade Shows | XML Mailing Lists | XML Quotes ]

Copyright 2002 Elliotte Rusty Harold
elharo@ibiblio.org
Last Modified January 14, 2003