2003 XML News

Wednesday, December 31, 2003

Mulberry Tech has posted the Call for Participation for Extreme Markup Languages 2004 which takes place from August 2-6, 2004 in Montréal. Extreme is a technical conference devoted to markup, markup languages, markup systems, markup applications, and software for manipulating and exploiting markup. At Extreme Markup Languages, software developers, tag set designers, librarians, computer scientists, linguists, markup theorists, taxonomists, publishers, lexicographers, typographers, and other XML brick layers and pipe-fitters devote the better part of a week to discussing questions like:

  • Is XML sufficiently invisible to be considered a success?
  • Will RDF or Topic Maps ever take off?
  • Who said documents went away?
  • Do you FO?
  • Will Office 2003 change everything? Anything?
  • How does a Topic Map mean?
  • Got metadata?
  • Will the schema language chaos hurt XML?
  • Should we sign a tag set non-proliferation pledge?
  • Can validation co-exist with versioning?

David Tolpin has released RNV 1.2, an open source Relax NG Compact Syntax validator written in ANSI C. Version 1.2 adds partial support for W3C XML Schema datatypes. RNV is published under a BSD license.

Tuesday, December 30, 2003

The W3C Voice Browser Working Group has posted the Candidate Recommendation of the Speech Synthesis Markup Language Specification. According to the abstract, the Speech Synthesis Markup Language "is designed to provide a rich, XML-based markup language for assisting the generation of synthetic speech in Web and other applications. The essential role of the markup language is to provide authors of synthesizable content a standard way to control aspects of speech such as pronunciation, volume, pitch, rate, etc. across different synthesis-capable platforms."

Monday, December 29, 2003

The W3C Multimodal Interaction working group has posted the second public working draft of EMMA: Extensible MultiModal Annotation markup language. According to the abstract, "This document is part of a set of specifications for multi-modal systems, and provides details of an XML markup language for describing the interpretation of user input. Examples of interpretation of user input are a transcription into words of a raw signal, for instance derived from a speech or pen input, a set of attribute/value pairs describing their meaning, or a set of attribute/value pairs describing a gesture. The interpretation of the user's input is expected to be generated by signal interpretation processes, such as speech and ink recognition, semantic interpreters, and other types of processors for use by components that act on the user's inputs such as interaction managers."

Sunday, December 28, 2003

Eric S. Raymond has released version 1.4 of doclifter, an open source tool that transcodes {n,t,g}roff documentation to DocBook. He claims the "result is usable without further hand-hacking about 95% of the time." This release can handle .TQ reduction, translate attempts to fake double quotes in text with `` and '', and catches a few more .RS/.RE cases. Doclifter is written in Python, and requires Python 2.2a1. doclifter is published under the GPL.

Saturday, December 27, 2003

The W3C Cascading Style Sheets working group has posted two related last call working drafts about printing with CSS, CSS Print Profile and CSS3 Paged Media Module. The CSS Print Profile "defines a subset of Cascading Style Sheets Level 2 [CSS2] and CSS3 module: Paged Media [PAGEMEDIA] specifically for printing to low-cost devices. It is designed for printing from mobile devices, where it is not feasible or desirable to install a printer-specific driver, and for situations were some variability between the device's view of the document and the formatting of the output is acceptable."

The CSS3 Paged Media Module "describes the page model that partitions a flow into pages. It builds on the CSS3 Box model module and introduces and defines the page model and paged media. It adds functionality for pagination, page margins, headers and footers, image orientation. Finally it extends generated content for the purpose of cross-references with page numbers." Comments are due by January 31 for both.

Thursday, December 25, 2003

The W3C Web Services Internationalization Task Force has published the first public Working Draft of Requirements for the Internationalization of Web Services. According to the intro,

A Web Service is a software application identified by a URI [RFC2396], whose interfaces and binding are capable of being defined, described and discovered by XML artifacts, and which supports direct interactions with other software applications using XML based messages via Internet-based protocols. The full range of application functionality can be exposed in a Web service.

The W3C Internationalization Working Group, Web Services Task Force, was chartered to examine Web Services for internationalization issues. The result of this work is the Web Services Internationalization Usage Scenarios document [WSUS].Some of the scenarios in this document demonstrate that to achieve worldwide usability, internationalization options must be exposed in a consistent way in the definitions, descriptions, messages, and discovery mechanisms that make up Web services.

Mostly this document proposes new SOAP features for expressing various user preferences for language, number sorting, date formats, etc. However, it also suggests the possibility of extending language tags (en, en-US, en-US-LA, en-US-LA-YAT) with other locale information such as time zone and collation preferences. At first read, that doesn't sound like a very good idea to me. I think these ought to be separate, independently parseable items rather than tying them all together in one unmarked-up string.

Wednesday, December 24, 2003

The W3C Web Ontology Working Group has issued proposed recommendations of all six of its specifications:

Quoting from the overview document,

The OWL Web Ontology Language is designed for use by applications that need to process the content of information instead of just presenting information to humans. OWL facilitates greater machine interpretability of Web content than that supported by XML, RDF, and RDF Schema (RDF-S) by providing additional vocabulary along with a formal semantics. OWL has three increasingly-expressive sublanguages: OWL Lite, OWL DL, and OWL Full.

Comments on all six are due by January 19.

Tuesday, December 23, 2003

As usual I'll be away for the holidays for the next week or so. Updates are likely to be a little sparse until I return.


Sleepycat Software has released Berkeley DB XML 1.2.0, an open source (Non-GPL viral) "application-specific, embedded data manager for native XML data" based on Berkeley DB. It includes C++ and Java APIs and supports XPath 1.0. 1.2.0 adds an API for in-place modification of documents.


Dave Malcolm has posted Conglomerate 0.7.8, an open source GUI XML editor for Linux written in C, and based on libxml2 and the GTk+ and Gnome libraries. This release fixes bugs and reorganizes the code. Conglomerate is published under the GPL.

Monday, December 22, 2003

The W3C DOM Working Group has posted the Proposed Recommendation of DOM Level 3 Validation. "This specification defines the Document Object Model Validation Level 3, a platform- and language-neutral interface. This module provides the guidance to programs and scripts to dynamically update the content and the structure of documents while ensuring that the document remains valid, or to ensure that the document becomes valid." I didn't notice any code-level changes since the candidate recommendation on my initial perusal of the draft. Comments are due by January 14.

Sunday, December 21, 2003

The W3C XQuery working group has published a new working draft of XML Syntax for XQuery 1.0 (XQueryX), the first update of this spec in more than two years. This defines a very ugly, pure XML-based syntax for XQuery that doesn't even use normal XPath syntax. Frankly, this syntax is so incredibly horrid that one suspects the working group deliberately sabotaged the effort in order to silence all the critics who've been complaining that XQuery is only pseudo-XML. However, XSLT is a proof by example that XML syntaxes for Turing-complete languages for processing XML don't need to be illegible. XQuery could be written in XML. The working group has simply chosen not to enable that in any real way. Neither XQueryX nor the pseudo-XML of real XQuery is an acceptable solution. The W3C should either abandon XML syntax for this effort completely for something along the lines of the RELAX NG compact syntax or define a genuine, usable XML syntax like XSLT's. The current XQuery syntaxes are just condescending, convoluted, and confusing.


Brendan Macmillan has posted version 2.1.4doc2 (whatever that means) of Java Serialization for XML (JSX) 2, a library for converting Java objects into streams of XML and reading the objects back from the streams. To use it, replace ObjectOutputStream with JSX.ObjectWriter and ObjectInputStream with JSX.ObjectReader. This is a bug fix release.

Saturday, December 20, 2003

Sun has posted version 0.2.4 of xmlroff, an open source XSL Formatting Objects to PDF converter. xmlroff is written in C for Linux, and relies on the libxml2, libxslt, and the GLib, GObject and Pango libraries from GTK+ and GNOME (though neither GTK+ nor Gnome is required). It also needs PDFlib, FreeType2, and Fontconfig. xmlroff can be run from the command line. It also includes a libfo library. This release fixes various bugs.

Friday, December 19, 2003

David Tolpin has released RNV 1.1.0, an open source Relax NG Compact Syntax validator written in ANSI C. RNV is published under a BSD license.

Thursday, December 18, 2003

The XSL Working Group has published the first working drafts of Extensible Stylesheet Language (XSL) Version 1.1 Requirements and Extensible Stylesheet Language (XSL) Version 1.1. Note that these cover what is normally referred to as XSL Formatting Objects, not XSL Transformations. According to the requirements abstract,

Since becoming a Recommendation on 15 October 2001, XSL 1.0 has enjoyed widespread support. However, the user community has expressed requirements that have encouraged various implementations to provide extensions to the language. These extensions--especially those implemented by more than one implementation--are clear candidates for standardization so as to maximize interoperability.

The XSL Working Group has surveyed and analyzed various existing extensions, user requirements, and features intentionally cut from XSL 1.0 due to lack of time. Using the results of this research, the Working Group is developing an XSL 1.1 version that incorporates current errata and includes a subset of relatively simple and upward compatible additions to XSL

New features planned for XSL-FO 1.1 are:

  1. Change bars
  2. Index improvements, especially merging page numbers
  3. Conditional graphic scaling, e.g., "scale-down-to-fit"
  4. Table of contents windows (aka bookmarks)
  5. Table markers that allow dynamically determined text to be put into table headers or footers
  6. Support for a value of "only" for the page-position property
  7. Support for a page-number-citation-last formatting object (retrieving the last page number of a section or document)
  8. Support for "flowmaps" and other region/float extensions
Wednesday, December 17, 2003

Sun has posted the first public review draft specification of the Java API for XML Processing 1.3. New features in this release include:

  • XPath support
  • Optional XInclude support
  • SAX 2.0.1 with extensions
  • XML 1.1 support
  • DOM Level 3
  • A javax.xml.Version class for getting information about the implementation
  • The javax.xml.validation package for schema-language agnostic validation.
  • A SecureProcessing class that sets limits on entity expansion and maximum node occurrences in schemas.
  • An XMLConstants class that provides various common namespace URIs
  • An XMLUtils class for checking XML names

There's some interesting new stuff here. However, a lot of old rough edges have not yet been smoothed out. For instance, under the description of DOMSource, we read, "The model of how the Transformer deals with the DOM tree in terms of mismatches with the XSLT data model or other data models is beyond the scope of this document." This is exactly the sort of problem the spec needs to address head on if true interoperability between different implementations is to be possible.

The spec itself is written in DocBook. It's published more openly on java.net, as well the usual license encumbered publication in the Java Community Process.

Tuesday, December 16, 2003

The W3C Resource Description Framework (RDF) Core Working Group has published six proposed recommendations. This set of six replaces the original two Resource Description Framework specifications from 1999, RDF Model and Syntax and RDF Schema. The six new specs are:

RDF Primer
According to the introduction, RDF

The Resource Description Framework (RDF) is a language for representing information about resources in the World Wide Web. It is particularly intended for representing metadata about Web resources, such as the title, author, and modification date of a Web page, copyright and licensing information about a Web document, or the availability schedule for some shared resource. However, by generalizing the concept of a "Web resource", RDF can also be used to represent information about things that can be identified on the Web, even when they cannot be directly retrieved on the Web. Examples include information about items available from on-line shopping facilities (e.g., information about specifications, prices, and availability), or the description of a Web user's preferences for information delivery.

RDF is intended for situations in which this information needs to be processed by applications, rather than being only displayed to people. RDF provides a common framework for expressing this information so it can be exchanged between applications without loss of meaning. Since it is a common framework, application designers can leverage the availability of common RDF parsers and processing tools. The ability to exchange information between different applications means that the information may be made available to applications other than those for which it was originally created.

RDF is based on the idea of identifying things using Web identifiers (called Uniform Resource Identifiers, or URIs), and describing resources in terms of simple properties and property values. This enables RDF to represent simple statements about resources as a graph of nodes and arcs representing the resources, and their properties and values.

Resource Description Framework (RDF): Concepts and Abstract Syntax
This document "defines an abstract syntax on which RDF is based, and which serves to link its concrete syntax to its formal semantics. It also includes discussion of design goals, key concepts, datatyping, character normalization and handling of URI references."
RDF Semantics
"This is a specification of a precise semantics, and corresponding complete systems of inference rules, for the Resource Description Framework (RDF) and RDF Schema (RDFS)."
RDF Vocabulary Description Language 1.0: RDF Schema
"This specification describes how to use RDF to describe RDF vocabularies. This specification defines a vocabulary for this purpose and defines other built-in RDF vocabulary initially specified in the RDF Model and Syntax Specification."
RDF/XML Syntax Specification (Revised)
"This document defines an XML syntax for RDF called RDF/XML in terms of Namespaces in XML, the XML Information Set and XML Base. The formal grammar for the syntax is annotated with actions generating triples of the RDF graph as defined in RDF Concepts and Abstract Syntax. The triples are written using the N-Triples RDF graph serializing format which enables more precise recording of the mapping in a machine processable form. The mappings are recorded as tests cases, gathered and published in RDF Test Cases."
RDF Test Cases
This document describes a set of machine-processable test cases for RDF though it does not contain the test cases themselves which are available separately.

Changes since the October last call working drafts are mostly editorial in nature. Comments on all six are due by January 19.

Monday, December 15, 2003

The W3C Web Content Accessibility Guidelines Working Group has posted the first public working draft of HTML Techniques for WCAG 2.0. "This document provides information to Web content developers who wish to satisfy the success criteria of 'Web Content Accessibility Guidelines 2.0' [WCAG20] (currently a Working Draft). It includes techniques, code examples, and references to help authors satisfy each success criterion. The techniques in this document are specific to Hypertext Markup Language content [HTML4], [XHTML1] although some techniques contain Cascading Style Sheet solutions [CSS1]. Deprecated examples illustrate techniques that content developers should not use. The techniques listed in this document are suggestions on how to conform to WCAG 2.0. However, they may not be the only way to satisfy each success criterion."


RenderX has released version 3.6.4 of XEP, its payware XSL Formatting Objects to PDF and PostScript converter. Version 3.6.4 is a bug fix release. The basic client is $299.95. The developer edition with an API is $999.95. The server version is $4999.95. Updates from 3.0 are free.

Sunday, December 14, 2003

Dave Malcolm has posted Conglomerate 0.7.7, an open source GUI XML editor for Linux written in C, and based on libxml2 and the GTk+ and Gnome libraries. Besides bugs fixes, this release adds support for localising XML element names and descriptions using the xml:lang attribute and intltool and the ability to ignore unnecessary whitespace text nodes (whatever those are) in the main editor view. Conglomerate is published under the GPL.

Saturday, December 13, 2003

Andy Clark has posted verrsion 0.83 of his CyberNeko Tools for the Xerces Native Interface (NekoXNI). This release fixes a few bugs in the HTML and DTD parsers. Other tools in the package include a generic XML pull parser, a RELAX NG validator, and a DTD to XML converter.

Friday, December 12, 2003

The W3C Scalable Vector Graphics Working group has posted the first working drafts of SVG Tiny Version 1.2 Requirements and Mobile SVG Profiles: SVG Tiny and SVG Basic, Version 1.2. SVG Tiny 1.2 is a "backwardly-compatible update of SVG Tiny 1.1 which adds some new features from SVG 1.2 add adds other features based on implementor and designer feedback on SVG Tiny 1.1."

Thursday, December 11, 2003

THe W3C Technical Architecture Working Group has published the last call working draft of Architecture of the World Wide Web, First Edition. This describes how URIs, HTTP, and XML should and should not be used. Extracting just the prinicpples, conmstraints, and good practices,

  • Silent recovery from error is harmful.
  • The identification mechanism for the Web is the URI.
  • A resource owner SHOULD assign a URI to each resource that others will expect to refer to.
  • Web architecture does not constrain a Web resource to be identified by a single URI.
  • Resource owners should not create arbitrarily different URIs for the same resource.
  • If a URI has been assigned to a resource, agents SHOULD refer to the resource using the same URI, character for character.
  • Avoid URI ambiguity.
  • Authors of specifications SHOULD NOT introduce a new URI scheme when an existing scheme provides the desired properties of identifiers and their relation to resources.
  • Agents making use of URIs MUST NOT attempt to infer properties of the referenced resource except as licensed by relevant specifications.
  • A resource owner who creates a URI with a fragment identifier and who uses content negotiation to serve multiple representations of the identified resource SHOULD NOT serve representations with inconsistent fragment identifier semantics.
  • User agents MUST NOT silently ignore authoritative server metadata.
  • Agents do not incur obligations by retrieving a representation.
  • Publishers of a URI SHOULD provide representations of the identified resource consistently and predictably.
  • Publishers of a URI SHOULD provide representations of the identified resource.
  • Format designers SHOULD provide for version information in language instances.
  • Format designers SHOULD document change policies for XML namespaces.
  • Language designers SHOULD provide mechanisms that allow any party to create extensions that do not interfere with conformance to the original specification.
  • Language designers SHOULD specify agent behavior in the face of unrecognized extensions.
  • Language designers SHOULD design formats that allow authors to separate content from presentation and interaction concerns.
  • Language designers SHOULD provide mechanisms for identifying links to other resources and to portions of representation data (via fragment identifiers).
  • Language designers SHOULD provide mechanisms that allow Web-wide linking, not just internal document linking.
  • Language designers SHOULD allow authors to use URIs without constraining them to a limited set of URI schemes.
  • Language designers SHOULD incorporate hypertext links into a data format if hypertext is the expected user interface paradigm.
  • Language designers who create new XML vocabularies SHOULD place all element names and global attribute names in a namespace.
  • Resource owners who publish an XML namespace name SHOULD make available material intended for people to read and material optimized for software agents in order to meet the needs of those who will use the namespace vocabulary.
  • Specifications that use QNames to represent URI/local-name pairs SHOULD NOT allow both forms in attribute values or element content where they would be indistinguishable from URIs.
  • Language designers who use QNames as identifiers of Web resources MUST provide a mapping to URIs.
  • In general, server managers SHOULD NOT assign Internet Media Types beginning with "text/" to XML representations.
  • In general, server managers SHOULD NOT specify the character encoding for XML data in protocol headers since the data is self-describing.

Comments are due by March 31, 2004.


Daniel Veillard has released version 2.6.3 of libxml2, the open source XML C library for Gnome. This release supports the latest XInclude working draft syntax.


The W3C XML Core Working Group has published a proposed edited recopmmendation of the XML Information Set. The changes are very minor overall. A lot of specific references to XML 1.0 have been deleted, and some very obscure points on what happens when a notation is declared multiple times have been clarified. This shouldn't have much effect on anybody.

Wednesday, December 10, 2003

Microsoft has posted the schemas for the XML file formats used by Microsoft Office 2003 including Word, Excel, and InfoPath. Ofr course the download is a Windows executable, rather than something standard like a zip file.


The first beta of Mozilla 1.6 has been posted. Mozilla is an open source web browser that supports XML, CSS, XSLT, XUL, HTML, XHTML, MathML, SVG, and lots of other crunchy XML goodness. It's available for Linux, Mac OS X, and Windows. This drop adds support for NTLM authentication on all platforms, automatic page translation feature via Google Language Tools, and ChatZilla 0.9.48. And of course various bugs are fixed.


Antenna House, Inc has posted bug fix releases for XSL Formatter 3.0 and 2.5. Specifically these are 2.5MR5 and 3.0MR1. XSL Formatter is $1250 payware for a single user Windows license, plus another $100 if you want hyphenation support, plus royalty fees if you want GIF or TIFF support. Linux/Unix prices start at $3000.


Tuesday, December 9, 2003

Version 2.5.0 of librsvg, the open source Gnome SVG rendering library, has been released. This release fixes assorted bugs.

Sunday, December 7, 2003

The XQuark project has released XQuark Fusion 1.0.0 and XQuark Bridge 1.0.0. XQuark Fusion is an "information integration engine, based on XQuery, for querying in real-time multiple, heterogeneous and distributed data sources." Data sources include relational databases wrapped through XQuark Bridge and XML files. "XQuark Bridge expands existing relational database functionalities with advanced XML import/export capabilities. It supports flexible extraction and publishing of relational data into any target XML format, using the XQuery language. Using a powerful mapping language, it can also perform efficient insertion of structured XML data into existing relational tables, while taking into account the database integrity constraints and transforming the implicit relations appearing in the XML document into explicit ones in the database." Both are published under the LGPL.

Saturday, December 6, 2003

KXParse 2.3, an XML parser written in PHP, has been released. KXParse is published under GNU General Public License (GPL).

Friday, December 5, 2003

Howard Katz has posted version 0.61 of XQEngine, an open-source, Java-based query engine for collections of XML documents. According to Katz,

Utilizing XQuery as its front-end query language, it lets you interrogate collections of XML documents for boolean combinations of keywords, much as Google and other search engines let you do for HTML. XQuery, however, provides much more powerful search capabilities than equivalent HTML-based engines, since its XPath component lets you specify constraints on attributes and element hierarchies, in addition to the specific word content you're searching on. Refer to the W3C's XML Query website to see what the W3C and other vendors are doing with XQuery and XPath.

XQEngine is a compact (roughly 250K) embeddable component written in Java. It's not a standalone application and requires a reasonable amount of Java programming skill to use. It has a straightforward programming interface that makes that fairly easy to do. It's single-threaded and should work well as a personal productivity tool on a single desktop, as part of a CD-based application, or on a server with low to moderate traffic. (Making the engine thread-capable is not overly difficult and remains a future project.)

Thursday, December 4, 2003

While googling around on an unrelated topic yesterday, I discovered WaMCoM Mozilla. "The intention of WaMCom.org is to produce web browser and mail client software that is more stable and more correct than the test releases produced by the Mozilla.org organization, in the hope it is suitable for end users. In order to achieve that, stable Mozilla releases are extended with correctness fixes. In addition it contains some security and cryptography enhancements."

Most important for my purposes, WaMCoM makes Mozilla 1.3.1 available for Mac OS 9. In this morning's testing it appears substantially more stable than the Mozilla 1.2.1 I had been using. The Mozilla Project dropped support for Mac OS 9 after 1.2.1, a version that has an annoying tendency to crash when more than a dozen or so windows are open. I need to spend a couple more weeks with this to be sure, but I've been happily surfing this morning using WaMCoM with dozens of windows open simultaneously, and right now it looks like all Mac OS 9 users should upgrade. This may let me put off buying a new desktop Mac and switching to OS X for a few more months. Hmm, Looks like I spoke too soon. I just managed to crash it by opening a few dozen pages at the New York Times. Still it may be a bit more satble than 1.2.1.


XForms Essential cover

Micah Dubinko has written XForms Essentials, an introduction to XForms, "a combination of two of the most successful experiments ever performed with the Web: XML and forms." The book is freely available online and for $29.95 on paper (usual discounts apply). It's published under the Gnu Free Documentation License. According to Dubinko,


You should read this book if you want to:
  • Create XForms files in a text or XML editor
  • Convert existing forms (electronic or paper) to XForms
  • Collect XML data from users in a user-friendly way
  • Reduce the amount of JavaScript needed within browser interfaces
  • Increase the security and reliability of your current information system by combining client-side and server-side checks into a common code base
  • Understand how to create interactive web sites using the latest standard technology
Wednesday, December 3, 2003

The Apache XML Project has released version 2.4.0 of Xerces-C, the popular open source parser written in C++. New features in this release include:

  • An API that provides access to the Post-Schema Validation Infoset
  • Persistent, thread-safe grammars
  • An enriched entity-resolution API
  • Grammars can be serialized to a binary format
Tuesday, December 2, 2003

YesLogic has released Prince 3.0, a $295 payware batch formatter for Linux and Windows that produces PDF and PostScript from XML documents with CSS stylesheets. New features in Prince 3.0 include

  • JPEG, PNG and TIFF support
  • New CSS3 Paged Media properties
  • Preserves white space in preformatted text
  • Relative font sizes, indents, margins, padding and borders
  • Inline blocks and inline tables
  • Improved style sheets for XHTML and DocBook

Tom Bradford has posted the third beta of dbXML 2.0, an open source native XML database written in Java and published under the GNU General Public License. New features in version 2.0 include:

  • Journaling transactions
  • XSLT transformations
  • Full text indexing and Full text querying
  • Pluggable security models
  • SSL connection support
  • JSP Tag Library

Java 1.4 is required.

Monday, December 1, 2003

Oleg Tkachenko has released XInclude.NET 1.2, an open source implementation of the XInclude 1.0 Last Call Working Draft written in C# for the .NET platform. This supports the XPointer framework, shorthand pointers, element(), xmlns() and xpath1() schemes. The .NET Framework 1.0 is required. Version 1.2 adds support for the XInclude syntax introduced in the latest second last call working draft.

Sunday, November 30, 2003

The W3C XML Protocol Working Group has published a note on SOAP Optimized Serialization Use Cases and Requirements. This note makes it pretty clear that the group intends to move away from XML as the format for SOAP messages. That's fine. If XML is not suitable for their use-cases, then they should use something else. One tool does nto fit all needs. However, I do wish the working group would be more open and up-front that this is in fact what they're doing. If you're not going to use XML, please stop using the word "XML" to describe what you are using. It's nto XML and its not going to be.

Saturday, November 29, 2003

My Google number is 51,900. Pretty good. I'm running ahead of Grady Booch, Bertrand Meyer, and Jerry Pournelle but behind Linus Torvalds, Tim Berners-Lee, Jesus Christ, and the Beatles.

I'm pretty sure Booch, Meyer, and Pournelle are all more famous than me and deservedly so. I suspect this has a lot to do with how much of your fame comes from the Web, and how much you self-post. I personally generate thousands of pages a year with my name on them between this site, seminar notes, archived mailing list postings, and other similar content. Plus, I suspect Google counts the pages on different URLs for my sites (e.g. http://www.cafeconleche.org/ and http://www.ibiblio.org/xml) as separate pages, so my number needs to be at least divided by two. I can't let my head get too big.

One advantage to such huge amounts of self-generated content is that what you're likely to find out about me on the web is pretty much what I want you to know about me. I control my own image. I was curious to see how much information I don't want random surfers to know popped up on the Web, so I did some searches for my name in conjunction with my home address, social security number, phone number, and so forth. I am pleased to report that I don't think you can find any of that without paying for it or asking me. In the past some of this data popped up in the domain name registration records and Alexa, but I think I've managed to scrub all that. There is still one way to find my address and home phone number on the Web, but you've got to think about it. You can't just google me. And now that I've noticed it, I'm going to see what I can do about scrubbing it too.

Friday, November 28, 2003

The W3C DOM Working Group has published Document Object Model (DOM) Level 3 Events Specification as a note. "This specification defines the Document Object Model Events Level 3, a generic platform- and language-neutral event system which allows registration of event handlers, describes event flow through a tree structure, and provides basic contextual information for each event. The Document Object Model Events Level 3 builds on the Document Object Model Events Level 2." CHanges since DOM Level 2 events include:

  • Event groups so that stopPropagation() no longer stops event propagation entirely, only in a given event group.
  • Within an event group, event listeners are now ordered
  • Namespace support
  • CustomEvent, TextEvent, KeyboardEvent, and MutationNameEvent interfaces
  • The Event interface has a namespaceURI field and four new methods: isCustom(), stopImmediatePropagation(), isDefaultPrevented(), and initEventNS().
  • The EventTarget interface has four new methods:
    addEventListenerNS(namespaceURI, type, listener, useCapture, evtGroup)
    removeEventListenerNS(namespaceURI, type, listener, useCapture) 
    willTriggerNS(namespaceURI, type)
    hasEventListenerNS(namespaceURI, type)
  • The DocumentEvent interface has a canDispatch(namespaceURI, type) method.
  • The UIEvent interface has an initUIEventNS() method.
  • The MouseEvent interface has getModifierState(keyIdentifierArg) and initMouseEventNS(...) methods.
  • The MutationEvent interface has an initMutationEventNS() method.
  • The EventException class has a DISPATCH_REQUEST_ERR constant
Thursday, November 27, 2003

XML Europe will be returning in Amsterdam this year, from April 18-21 in the RAI Centre. Proposals are due by January 5. I may submit one myself and attend this year for the first time. After all, it's in Amsterdam. :-)


The Big Faceless Organization has released the Big Faceless Report Generator 1.1.12, a $1200 payware Java application for converting XML documents to PDF. Unlike most similar tools it appears to be based on HTML and CSS rather than XSL Formatting Objects. This is mostly a bug fix release. Java 1.2 or later is required.

Wednesday, November 26, 2003

Effective XML has been published on Safari. Safari subscribers can now read it in its entirety online.

Meanwhile Tech Book Report has published another nice review of Effective XML, and it's jumpoed way up in the Amazon rankings since the Slashdot review came out. Check it out if you haven't already.


The W3C Math Working Group has published three notes:

Bound Variables in MathML
"Bound variables are central representational primitives in mathematical languages. They allow one to express functions, quantification, and operators with qualifiers. The first edition of the MathML 2.0 Recommendation [MathML2] was somewhat vague about the identity conditions on bound variables, and as a consequence Content MathML applications were left to guess the exact meaning. This Note provides some of the rationale behind how this has been clarified in the second edition"
Structured Types in MathML 2.0
"This Note discusses the facilities that are available in the MathML 2.0 Recommendation to facilitate the capturing of mathematical type information. It demonstrates how a combination of these features can be systematically used to provide support for general mathematical types."
Units in MathML
"MathML is an XML application for describing mathematical notation, capturing both its structure and content. As such, its scope does not extend to include units - determinate quantities adopted as standards of measure - which nevertheless, by their very nature, occur in an applied mathematical setting. This Note makes recommendations and suggestions for how units can be incorporated into MathML."
Tuesday, November 25, 2003

Slashdot has posted a review of Effective XML. The comments are mostly insightful. They've already given me one idea for another book. I'm in the process of reading through them all and responding right now. If anyone cares to moderate my comments up so they'll be seen, it would be appreciated. Bottom line: they like the book, and Natalie Portman can pour hot grits down my pants. :-)


Release early and release often. I've posted the second alpha of the XQuisitor GUI XQuery tool. Alpha 2 adds a Query menu, provides more mnemonics, and cleans up the user interface a tad, and fixes a couple of bugs.


The Ginger Alliance has released Sablotron 1.0.1, an open source XML processor for C++ has been released. Sablotron supports XSLT 1.0, XPath 1.0, DOM Level 2, and some extension functions from EXSLT. Sablotron is dual licensed under the the Mozilla Public License 1.1 and the GNU General Public License (GPL). It should run on most modern Windows and Unixes.

Monday, November 24, 2003

I've posted the first alpha of XQuisitor, a simple open source GUI for XQuery written in Java that I demoed last week at the New York XML SIG. The hard work of implementing XQuery was done by Michael Kay. XQuisitor is just a layer of GUI icing on top of the Saxon cake.

XQuisitor is still rough around the edges, but I think it's useful enough for anyone who wants to explore XQuery. I find it makes experimenting significantly easier than using the Saxon command line interface. Bug reports, quibbles about the user interface, and so forth are appreciated. The GUI is written in Swing.

It's mostly been tested on Linux and a little on Mac OS X 10.2. It should run on any Java 1.4 platform, but I haven't verified that. Java 1.4 and Saxon 7.8 are required. XQuisitor is published under the GPL.

Sunday, November 23, 2003

Tom Bradford has posted the second beta of dbXML 2.0, an open source native XML database written in Java and published under the GNU General Public License. New features in version 2.0 include:

  • Journaling transactions
  • XSLT transformations
  • Full text indexing and Full text querying
  • Pluggable security models
  • SSL connection support
  • JSP Tag Library

This second beta fixes bugs. Java 1.4 is required.

Saturday, November 22, 2003

The XML Apache Project has released version 2.6 of Xerces-J, the popular open source XML parser for Java. This expands the experimental support for DOM Level 3 and moves SAX up to 2.0.1. In addition, some bugs were fixed, including the correct setting of base URIs for documents read from a redirected HTTP request. It also enables XML 1.1 by default, a very bad decision. Currently I can count the number of people in the world who need XML 1.1 on the fingers of one hand. Hell, that's too generous. I can count them on one thumb. There's no excuse for making this the default option. I can't say I'm surprised by this, though. IBM is the primary author of Xerces, and the prime mover behind XML 1.1. Despite all the nice talk about supporting minority languages, the real motive for XML 1.1 is to support the non-standard, non-interoperable line breaking conventions used on some IBM mainframes. I'm working on figuring out how to disable the XML 1.1 support. Once that's done, I'll probably make this the default parser for XOM, since other than the XML 1.1 issues, Xerces is a very nice parser.


Intesis has released eSVG 2.0, an implementation of the subsets of SVG 1.1 and SVG Mobile specifications designed for integration into embedded systems. eSVG provides multithreaded eSVG scripting according to the SVG DOM 2 interface specification. eSVG scripting is based on SpiderMonkey (JavaScript-C) Engine and ORMIDE. eSVG supports most SVG Tiny profile features, SVG Basic profile features, SVG DOM interface entries, and SMIL animation. New features in 2.0 include:

  • Pocket PC (Windows CE) support
  • HTML functions can be called from Java Script
  • Support for dynamic image and pattern reference/content changing
  • Raster image export
  • Cursor handling
  • ActiveX/DLL entries for printing

eSVG runs on Windows 98/NT/2000/ME/XP, Windows CE, and UniOP MMI. eSVG costs €380 for 3 developer/50 runtime licenses.


EvolGrafiX has released XStudio 2.0, a €579;payware eSVG based WYSIWYG SVG authoring tool.


Syntext has released Serna 1.0.1, a $299 payware XSL-based WYSIWYG XML Document Editor for Windows and Linux. Features include on-the-fly XSL-driven XML rendering and transformation, on-the-fly XML Schema validation, and spell checking. Pricing is reduced to $149 until the end of the year. If you actually want support, it will cost you $99 extra. Personally, I'm not willing to pay for any product that doesn't include support.

Friday, November 21, 2003

The W3C Quality Assurance (QA) Activity has posted the candidate recommendation of QA Framework: Specification Guidelines. Accorsing to the abstract, "The principal goal of this document is to help W3C Working Groups to write clearer, more implementable, and better testable technical reports. It provides both a common framework for specifying conformance requirements and definitions, and also addresses how a specification might allow variation among conforming implementations, both of which facilitate the generation of test materials. The material is presented as a set of organizing guidelines and verifiable checkpoints." Comments are due by May 10.


Wednesday, November 19, 2003

I've posted XOM 1.0d22, my tree-based streaming API for processing XML with Java. This release collects a large number of small fixes and improvements including recoverable validity errors, node-level DOM conversion, Unicode normalization form C support, and more. There are API level changes in this release that will affect many users. The largest is that NodeList has been renamed Nodes. The second largest is that ParseException has been renamed ParsingException to avoid a name conflict with java.text.ParseException. However, I am getting very close to API freeze. They're only a couple of possible API changes still being considered, all in Canonicalizer and NodeFactory. The rest of the API should be stable. The XInclude support is going to have to be changed to support the latest working draft (this is probably the last release that will support the old namespace URI and syntax) but I don't anticipate this requiring backwards incompatible changes to the API. Details on the web page.

Tuesday, November 18, 2003

I've posted the notes from last night's XQuery presentation to the New York XML Special Interest Group. We had a small but enthusiastic crowd of about 20 people. One thing that became apparent by the end of the evening was that the XPath 2.0 data model is deeply confusing, at least on first presentation, especially compared to XSLT 1.0. The problem surfaced very early with my first sample query:

   for $t in doc("bib.xml")/bib/book/title
   return
      $t 

This replaced the first query I've used in some other presentations:

<bib>
  {
   for $t in doc("bib.xml")/bib/book/title
   return
    <book>
     { $t }
    </book>
  }
</bib>

I thought the first query was simpler because it didn't use direct element constructors, and made it more apparent that an XQuery is not an XML document. (Another design feature several attendees objected to, by the way. Trying to explain exactly when, where, and how it was necessary to escape different content was another high, holy mess.) However, the second query produces a single element, which can be obviously serialized as an XML document. People liked this. The first query produces a sequence of element nodes, which Saxon serializes as several document fragments, like so:

<?xml version="1.0" encoding="UTF-8"?>
<title>TCP/IP Illustrated</title>
<?xml version="1.0" encoding="UTF-8"?>

<title>Advanced Programming in the Unix Environment</title>
<?xml version="1.0" encoding="UTF-8"?>
<title>Data on the Web</title>
<?xml version="1.0" encoding="UTF-8"?>
<title>The Economics of Technology and Content for Digital TV</title>

Aside from the text declarations, this isn't any different than you might see in XSLT 1.0. And I pointed out that this was hardly the only possible serialization of the result sequence. For example, if you turned on wrapping, Saxon gives you this output instead:

<?xml version="1.0" encoding="UTF-8"?>
<result:sequence xmlns:result="http://saxon.sf.net/xquery-results">
   <result:element>
      <title>TCP/IP Illustrated</title>
   </result:element>
   <result:element>

      <title>Advanced Programming in the Unix Environment</title>
   </result:element>
   <result:element>
      <title>Data on the Web</title>
   </result:element>

   <result:element>
      <title>The Economics of Technology and Content for Digital TV</title>
   </result:element>
</result:sequence>

The attendees liked this result even less. And they really hated the idea that a different tool might produce still a third or a fourth format. They really, really wanted one unique XML output from a query, possibly modulo insignificant details like the use of empty-element tags or boundary white space. Nobody objected when I turned on Saxon's option to pretty print the output because they didn't view that as a creating a different result from the same query.

In XSLT 1.0 all output is XML. A transformation creates a result tree, which can always be serialized as either an XML document or a well-formed document fragment. In XSLT 2.0 and XQuery the output is not a result tree. Rather, it is a sequence. This sequence may contain XML; but it can also contain atomic values such as ints, doubles, gYears, dates, hexBinaries, and more; and there's no obvious or unique serialization for these things. For instance, what exactly are you supposed to do with an XQuery that generates a sequence containing a date, a document node, an int, and a parentless attribute? How do you serialize this construct? That a sequence has no particular connection to an XML document was very troubling to many attendees.

Looking at it now, I'm seeing that perhaps the flaw is in thinking of XQuery as like XSLT; that is, a tool to produce an XML document. It's not. It's a tool for producing collections of XML documents, XML nodes, and other non-XML things like ints. (I probably should have said it that way last night.) However, the specification does not define any concrete serialization or API for accessing and representing these non-XML collections. That's a pretty big hole left to implementers to fill.

Monday, November 17, 2003

My e-mail's on the fritz at the moment, apparently due to a change in the IBiblio root certificate. If you absolutely have to contact me right now, use the telephone. (Anyone who absolutely has to contact me right now already knows the number.)


Word of the day: Boundary Whitespace. I just discovered this in the XQuery working draft. It's something we've needed a good term for for a long time. For example, consider this element:

<name>
  <first>Jada</first>
  <middle>Pinkett</middle>
  <last>Smith</last>
</name>

Boundary white space is the space between the end of one tag and the beginning of the next, when the text between those tags does not contain anything except white space. Traditionally, this has been called ignorable white space, which is wrong because it isn't ignorable, or white space in element content, which only really applies when a DTD specifies that the element should not contain anything except child elements.

Technically, XQuery also uses this term to include whitespace between tags and query braces, as in this example:

<name>  
  { for $name in //name/text()
    return $name
} </name>

However, outside XQueries (which aren't XML documents) the braces have no particular meaning, so in that case I feel justified in saying boundary whitespace occurs exclusively between tags.

Kudos to the XQuery working group for coming up with this. I'm not very fond of XQuery itself, but this is at least one useful contribution to emerge from that group.


Altova has released XMLSpy 2004, a $399 payware XML editor for Windows. New features in this release include:

  • XML Differencing
  • XPath 2.0 Analyzer
  • Enhanced Database Connectivity including native support for Oracle9i.
  • Generate Project Files for other popular 3rd party IDEs including Borland C# Builder and Mono
  • Visual Studio.NET integration

Tal Rotbart has written a java.sql.ResultSet DOM Wrapper. Th wrapper implements the DOM Document interface on top of a JDBC result set. The ResultSetDocument does not contain the actual result-set data and meta-data, but instead contains references to row and column indices. THis makes it very memory efficient, but it requires that the result-set be created with scroll capabilities. However, the wrapper is read-only. Interesting idea. I've seen this done with SAX (I've done that myself.) but this is the first time I've seen this done with DOM. This is published under the LGPL.

Sunday, November 16, 2003

The W3C Scalable Vector Graphics Working Group has posted a new working draft of Scalable Vector Graphics (SVG) 1.2. This is still not a complete, finished spec; but there are numerous new features including:

  • Vector effects can define a transformation of a primitive shape's outline that happens before it is drawn
  • The style element now uses an xlink:href attribute to link in the stylesheet
  • No more transform attribute on the tspan element
  • No more z-index property
  • A new handler element contains code to be executed in response to an event.
  • background-fill only allows solid colors
  • Support for DOM Level 3 Core, Events and XPath
  • New DOM interfaces for audio and video, and all SVG Media interfaces.
  • A new transition element defines a single transition class, as in SMIL.
  • Tool tips now reference hint elements rather than title elements.
  • desc, title and metadata elements can be simple XLinks
  • The switch element is now allowed anywhere.
Saturday, November 15, 2003

Tom Bradford has posted a beta of dbXML 2.0, an open source native XML database written in Java and published under the GNU General Public License. New features in version 2.0 include:

  • Journaling transactions
  • XSLT transformations
  • Full text indexing and Full text querying
  • Pluggable security models
  • SSL connection support
  • JSP Tag Library

Java 1.4 is required.

Friday, November 14, 2003

I've updated the XML Conferences page. There are definitely fewer of these than there used to be. At least one company that did several XML shows a year has gone out of business, one has dropped out of the market completely, and two are down to less than one show a year.

For myself, I'll be chairing the XML track at Software Development 2004 West again next year. We've got a good program with two new tutorials from Jason Hunter and Eric van der Vlist, and 12 seminars of which eleven are completely new at this show. But it's still only one track in a larger show. I'll probably attend WWW 2004, mostly since it's in New York and therefore convenient and cheap for me. I may submit a poster on XOM design principles or some such. And for once I've got a good idea for an Extreme Markup Languages paper far enough in advance that I may be able to get back to Montreal this year. But even with all that, it looks like I'll be doing a lot less speaking in 2004 than in the past.


Sam Tregar has released XML::Validator::Schema 1.05, a Perl module that validates XML documents against a partial subset of the W3C XML schema language. It is implemented as a SAX filter on top of XML::SAX. 1.05 support many more simple types.


Andy Clark has posted a new release of his CyberNeko Tools for the Xerces Native Interface (NekoXNI) that fixes various of bugs in the HTML and DTD parsers. Other tools in the package include a generic XML pull parser, a RELAX NG validator, and a DTD to XML converter.


The Apache Project has released Cocoon 2.1.3, an open source "web development framework built around the concepts of separation of concerns and component-based web development. Cocoon implements these concepts around the notion of 'component pipelines', each component on the pipeline specializing on a particular operation. This makes it possible to use a Lego(tm)-like approach in building web solutions, hooking together components into pipelines without any required programming." Cocoon can assemble data from many sources including filesystems, SQL databases, LDAP, native XML databases, and SAP. It can customize the output to generate HTML, WML, PDF, SVG, and RTF from the same inputs. Processes it supports include XSL transformation and XInclude resolution. Cocoon can run as a servlet inside an existing web server or standalone through a commandline interface. 2.1.3 is primarily a bug fix release.


Bare Bones Software has released BBEdit 7.1. This is a free update for all 7.0 users. BBEdit is the $179 payware Macintosh text/HTML/XML/programmer's editor I normally use to write this page. New features in this release include secure FTP support and live preview for HTML files. Mac OS X 10.2 or later is required. Mac OS 9 is not supported.


Brendan Macmillan has posted version 2.1.4 of Java Serialization for XML (JSX) 2, a library for converting Java objects into streams of XML and reading the objects back from the streams. To use it, replace ObjectOutputStream with JSX.ObjectWriter and ObjectInputStream with JSX.ObjectReader. This release adds an XML schema of the XML format for classes serialized with JSX.

Thursday, November 13, 2003

Word of the day: Flashturbation, "The practice of using Macromedia Flash on Web sites for nothing more than demonstrating its cool 'whiz-bang' features." Why didn't I know this word sooner? It's such an accurate description of so many sites today.


The W3C XQuery and XSLT Working Groups have dropped another load of working drafts into the world:

I guess I'll have to digest these before my talk about XQuery on Monday to the New York XML User's Group. (E-mail Walter Perry at wperry@xml-sig.org for details and admission.)

In my first read-through changes since the last drafts seem fairly minor. In the XPath 2.0 draft, "The section entitled "SequenceType Matching" has been rewritten and includes new material on handling of unrecognized types. A new concrete type, xdt:untypedAny, has been introduced, and the isnot comparison operator has been removed. Rules for static and dynamic implementations have been clarified." The XQuery 1.0 changes seem to be limited to these changes in XPath 2.0.

The XSLT 2.0 draft states, "there are relatively few technical innovations in this draft, but a substantial amount of editorial revision and clarification. The technical changes of note are the ability of many XSLT instructions (for example, xsl:attribute and xsl:value-of) to use a select attribute or a contained sequence constructor interchangeably, and the introduction of tunnel parameters which allow parameter values to be passed from a high-level template rule to a low-level rule without being declared in all the intermediate templates. Named sort keys and the sort function have been replaced with a new xsl:perform-sort instruction. There have been revisions to the date formatting functions, aligning them with the xsl:number instruction and transferring some of the functionality into xsl:number to make it more widely applicable."


Michael Kay has released Saxon 7.8, an experimental open source implementation of large parts of XSLT 2.0 and XPath 2.0 in Java. Version 7.8 brings the syntax into sync with the XQuery/XPath 2.0/XSLT 2.0 working drafts of November 12 including tunnel parameters and the ability to import functions from an XQuery library module into an XSLT stylesheet. It also fixes numerous bugs. Java 1.4 is required. Saxon is published under the Mozilla Public License 1.0.


The Big Faceless Organization has released the Big Faceless Report Generator 1.1.11, a $1200 payware Java application for converting XML documents to PDF. Unlike most similar tools it appears to be based on HTML and CSS rather than XSL Formatting Objects. Java 1.2 or later is required.

Wednesday, November 12, 2003

This coming Monday, November 17, at 7:00 P.M. I'll be talking about XQuery at the meeting of the XML Special Interest Group in downtown Manhattan (Goldman Sachs, 180 Maiden Lane, Room 30C). The meeting is free, but preregistration is required to get through security. To reserve a place at this meeting, e-mail Walter Perry at wperry@xml-sig.org. Please register by Friday to guarantee admission.


Dave Malcolm has posted Conglomerate 0.7.6, an open source GUI XML editor for Linux written in C, and based on libxml2 and the GTk+ and Gnome libraries. This release works much better with non-Roman scripts. Conglomerate is published under the GPL.


Norman Walsh has posted the fifth beta of DocBook 4.3, the XML application I used to write Processing XML with Java. According to Walsh, "This version fixes some small bugs in the definition of firstterm." There's also a RELAX NG version.


IBM's alphaworks has updated its Web Services Tool Kit for Mobile Devices to support the Discovery Framework and Java APIs for processing WSIL documents. This tookit also supports WCE and SMF environments (whatever those are). However, it only supports a subset of SOAP 1.1. 2.1.0 also drops support for kSOAP and Blackberry.

Tuesday, November 11, 2003

The W3C XInclude working group has moved the XInclude specification back to last Call working draft to address a couple of concerns that arose during the candidate recommendation. The three major changes are:

  • Fragment identifiers in URIs are ignored. Instead, XPointers must be placed in a new xpointer attribute. For example, instead of writing <xinclude:include href="http://www.cafeconleche.org/#today"/> you must now write <xinclude:include href="http://www.cafeconleche.org/" xpointer="today"/>
  • There are three new accept, accept-charset, and accept-language attributes used to perform content negotiation with web servers.
  • The namespace URI is now http://www.w3.org/2003/XInclude instead of http://www.w3.org/2001/XInclude

In my opinion this draft is a step backwards. The xpointer attribute is a really ugly kludge to work around the failure of the XLink working group to define a fragment identifier syntax for XML. The accept attributes are underspecified, and open up some potential security holes.

Monday, November 10, 2003

Alexandre Brilliant has released JXMLPad 1.9.9, a €90 shareware JavaBean component for editing XML. This release supports multiple editors inside a JTabbedPane with a common toolbar. It also fixes various bugs. Java 1.2 or later is required.

Sunday, November 9, 2003

The W3C has posted the proposed recommendations of XML 1.1 and Namespaces in XML 1.1. These are bad ideas, for reasons I've elaborated before, both here and elsewhere. Allow me to quote myself:

Everything you need to know about XML 1.1 can be summed up in two rules:

  1. Don't use it.

  2. (For experts only) If you speak Mongolian, Yi, Cambodian, Amharic, Dhivehi, Burmese or a very few other languages and you want to write your markup (not your text but your markup) in these languages, then you can set the version attribute of the XML declaration to 1.1. Otherwise, refer to rule 1.

These drafts do not appear to introduce any substantive changes since the candidate recommendations. In particular a change that would have prohibited line breaks in the XML declaration appears to have been rescinded at the last minute, probably because that would have required the spec to go back to last call. Instead, it is not stated that "The characters #x85 and #x2028 cannot be reliably recognized and translated until an entity's encoding declaration (if present) has been read. Therefore, it is a fatal error to use them within the XML declaration or text declaration."

Update: I missed something on my first read through. There is a substantive change in the proposed recommendation of XML 1.1 (In direct violation of the W3C's advertised process, I'll note). C0 control characters such as BEL and vertical tab are now allowed directly in XML documents. They no longer have to be escaped with numeric character references. Among other effects, this makes it much harder to detect documents which whose encoding declaration is incorrect.

Update to the update: It appears this was an an unintentional failure in the editing. Several working group members have told me they meant to forbid literal control characters. They simply forgot to do it. Expect at least one more draft before the final recommendation. However, even if this problem is fixed, I still think XML 1.1 is a bad idea, and should not be recommended.

Comments on both drafts are due by December 5.

Saturday, November 8, 2003

The W3C Document Object Model (DOM) working group has released candidate recommendations of the Document Object Model (DOM) Level 3 Core Specification and the Document Object Model (DOM) Level 3 Load and Save Specification.

Changes since the last draft in the Core specification include

  • The DOMStringList interface now has a contains(String str) method.
  • The NameList interface has contains(String name) and contains(String name, String namespaceURI) methods.
  • The TypeInfo class now has an isDerivedFrom() method to tell whether one type is derived from another by restriction, extension, union, or list.
  • In DOMLocator, the getOffset() method has split into getByteOffset() and getUtf16Offset(). (I really don't approve of this change. It should be a sinngle character offset as in the last draft.)
  • DOMConfiguration has a new getParameterNames() method that returns a list of all known parameters.

Of special note with this release, "Given the lack of implementation commitments regarding character normalization, the DOM Working Group considers it "at risk". This affects the "check-character-normalization" and "normalize-characters" parameters defined in the DOMConfiguration interface in the [DOM Level 3 Core]. The Working Group may remove the parameters before requesting Proposed Recommendation status."

In the Load-Save draft, the most substantive change appears to be that the optional convenience interfaces DocumentLS and ElementLS have been removed completely. Editorially, methods like createDOMParser() and createDOMSerializer() appear to have been renamed createLSParser() and createLSSerializer(). Interfaces like DOMInput and DOMFilter are now LSInput and LSFilter. DOMSerializer.writeURI() is now LSSerializer.writeToURI(). DOMInput.getCertified() and setCertified() are now LSInput.getCertifiedText() and setCertifiedText().

Comments are due by November 30.


John Krasnay has posted Vex 0.5, an open source visual editor for XML with a word processor-like interface that is written in Java. "It is targeted toward users of XML schemas like DocBook and XHTML that represent human-readable documents." Vex is published under the GPL.

Friday, November 7, 2003

BEA Systems has released the Streaming API for XML (StAX) along with a reference implementation. This is a Java-based, pull-parsing API for XML. StAX offers two approaches. XMLStreamReader and XMLStreamWriter are a cursor API designed to read and write XML as efficiently as possible. XMLEventReader and XMLEventWriter are an iterator API designed to be easy to use, event based, easy to extend, and allow easy pipelining. The iterator API sits on top of the cursor API.

This API isn't quite rigorous enough for my tastes. It allows implementations to miss well-formedness errors, generate malformed XML, and throw away information wiuthout notifying the client. However it's probably the best pull API yet.


ActiveState has released Visual XSLT 2.0, a payware XSLT plug-in for the Visual Studio .NET IDE. New features in version 2.0 include Just-In-Time (JIT) Debugging, and a Visual Schema Mapper. Commercial licenses for Visual XSLT 2.0 are $295. Upgrades are $99.95.

Thursday, November 6, 2003

The W3C has posted the minutes and position papers from September's W3C Workshop on Binary Interchange of XML.


Oleg Tkachenko has released nxslt 1.3, a Windows command line utility for accessing the .Net XSLT engine. This release supports the candidate recommendation of XInclude, adds a few EXSLT extension functions, and improves performance of existing functions. nxslt is written in C# and requires the .NET Framework version 1.0 to be installed.

Wednesday, November 5, 2003

Daniel Veillard has released version 2.6.2 of libxml2, the open source XML C library for Gnome. This is mostly a bug fix release, but also adds an API to create a W3C XML Schema from an instance document.


Veillard has also released version 1.1.0 of libxslt, the GNOME XSLT library for C and C++. This is a bug fix release.

Saturday, November 1, 2003

The W3C Core Working group has published a proposed edited recommendation of XML 1.0, third edition. "This third edition is not a new version of XML. As a convenience to readers, it incorporates the changes dictated by the accumulated errata (available at http://www.w3.org/XML/xml-V10-2e-errata) to the Second Edition of XML 1.0, dated 6 october 2000. In addition, markup has been introduced on a significant portion of the prescriptions of the specification, clarifying when prescriptive keywords such as MUST, SHOULD and MAY are used in the formal sense defined in [IETF RFC 2119]."

Friday, October 31, 2003

I've fixed the permalinks. It's not clear if anyone's using them since they were broken for the last three weeks and nobody noticed.


The XML Apache Project has released Xalan-Java 2.5.2, an open source XSLT processor. Most of the changes in this release are bug fixes or small performance improvements. There's also an alternate binary distribution that puts XSLTC and Xalan-Interpretive in separate jar files so they can be distributed or bundled independently of each other.

Thursday, October 30, 2003

Daniel Veillard has released version 2.6.1 of libxml2, the open source XML C library for Gnome. This releases fixes lots of bugs introduced in the 2.6 tree.


Peter J. Jones has posted xmlwrapp 0.4.4, a C++ library for working with XML built on top of Daniel Veillard's libxml2. This release now works with libxml 2.6. xmlwrapp is published under a BSD license.


XML Benchmark 1.2.2, a C/C++/Java toolset for benchmarking XML parsers including libxml2, Xerces, Oracle XDK, Expat, RXP, QT, and Crimson, has been released. Benchmarks include parsing (native, SAX, DOM), DOM manipulation, schema validation, XSL transformation, and XML signature and encryption. Version 1.2.2 adds support for XML Security 1.2 and arabica.

I've learned to treat benchmarks with a 20-pound bag of rock salt until proven otherwise. However, this product gets at least one thing right. It lets you plug in "Any valid XML file" so you can test parsers on the kind of documents you're interested in rather than on whatever the benchmark vendor has. Most parsers exhibit wildly varying performance characteristics depending on the type of XML document (large or small, record-like or narrative, many attributes or few attributes, etc.). It's not clear whether or not this parser can test well-formed but invalid documents.

Wednesday, October 29, 2003

Norman Walsh has posted the fourth beta of DocBook 4.3. Changes in this release include:

  • A number of new values for the class attribute of the database element
  • Optional title attribute on glosslist element
  • Revision allows author or authorinitials.
  • void is now optional on methodsynopsis, consstructorsynopsis, and destructorsynopsis.
  • emailmessage, webpage, and newsposting are possible value for the pubwork attribute of citetitle
  • process, service, server, and daemon are new class values of systemitem
  • blockinfo is allowed on blockquote
  • Added initializer to paramdef
  • prefix, namespace, and localname are new class values of sgmltag
Tuesday, October 28, 2003

I've posted two more chapters from Effective XML that address the proper design of markup:

  1. Make structure explicit through markup
  2. Store metadata in attributes

I was inspired to put these up by an ongoing discussion on xml-dev about Microsoft's XAML vs. Mozilla's XUL. In my opinion, Microsoft gets some things right that XUL gets wrong, as these two chapters explain.


The IETF has posted a new draft of Internationalized Resource Identifiers (IRI). In brief, an IRI is like a URI that is not limited to ASCII. It can contain characters such as é and Θ.

Monday, October 27, 2003

Denny Vrandecic has released Xml4Ada95, an open source DOM language binding for Ada based on Xerces, and therefore supports validation, schemas, and more. Xml4Ada95is licensed under a BSD-like license.


Alexandre Brilliant has released JXMLPad 1.9.7, a €90 shareware JavaBean component for editing XML. This is a bug fix release. Java 1.2 or later is required.

Sunday, October 26, 2003

Sun has released the Java Web Services Developer Pack 1.3. This includes the reference implementation for JAXB 1.0 as well as

  • Java Server Faces 1.0EA4
  • Java Server Pages Standard Tag Library (JSTL) 1.1 EA
  • XML and Web Services Security 1.0EA2
  • Java Architecture for XML Binding (JAXB) 1.0.2
  • Java API for XML Messaging (JAXM) 1.1.1
  • Java API for XML Processing (JAXP) 1.2.4
  • Java API for XML Registries (JAXR) 1.0.5
  • Java API for XML-based RPC (JAX-RPC) 1.1
  • SOAP with Attachments API for Java (SAAJ) v1.2
  • JavaServer Pages Standard Tag Library (JSTL) 1.0.3
  • Java WSDP Registry Server 1.0_06
  • Ant Build Tool 1.5.4
  • Apache Tomcat 5.0 dev container

Version 1.3 now supports the WS-I basic profile 1.0. It also fixes assorted bugs.


RenderX has released version 3.6.3 of XEP, its payware XSL Formatting Objects to PDF and PostScript converter. Version 3.6.3 is a bug fix release. The basic client is $299.95. The developer edition with an API is $999.95. The server version is $4999.95. Updates from 3.0 are free.

Saturday, October 25, 2003

Toni Uusitalo has posted Parsifal 0.7.3, a minimal, non-validating XML parser written in ANSI C. The API is based on SAX2. Parsifal doesn't yet catch all the well-formedness errors it should, but unlike a lot of so-called fast parsers the author does seem to realize this is important, and is working on fixing the problems. I can't recommend this parser just yet, but by the time it hits 1.0, it may be worthy addition to the C programmer's toolbox. This release fixes assorted bugs. Parsifal is in the public domain.

Friday, October 24, 2003

Frederic Laurent has released Lantern 1.0, a free GUI program written in Java that allows users to load XML documents and then test XPath expressions against those documents. Lantern is published under the GPL. The web page is in French.

Thursday, October 23, 2003

Antenna House, Inc has released XSL Formatter 3.0 for Linux and Windows. For the first time, a Solaris version is also available. Version 3.0 can format much longer documents, speed is increased, PDF output is now bundled, SVG support has been added, and many more languages can be hyphenated. XSL Formatter is $1250 payware for a single user Windows license, plus another $100 if you want hyphenation support, plus royalty fees if you want GIF or TIFF support. Linux/Unix prices start at $3000.

Antenna House has also posted a maintenance release for XSL Formatter 2.5 that fixes a few bugs.


Opera Software has released version 7.21 of its namesake web browser for Windows. Opera is $39 payware or free-beer adware. Opera supports XML, CSS, HTML, and various other acronyms but not XSLT. Version 7.21 is a bug fix release.

Wednesday, October 22, 2003

Cool discovery of the day: some mailing lists including those at the W3C are now using an X-Archived-At: header to give a permanent URL for the message. I can key off of this to make better links for quotes of the day from e-mail messages, as well as to link to recommended reading in e-mail. I encourage other mailing lists to adopt this excellent practice. Kudos to the W3C. I wish I'd noticed this months ago.


The W3C has released the second edition of MathML 2.0, an XML application for describing mathematical equations and data in both semantic and presentational forms. This is mainly an editorial rewrite of the spec. It's not significantly different than MathML 2.0 in most respects.

Tuesday, October 21, 2003

Daniel Veillard has released version 2.6.0 of libxml2, the XML C library for Gnome. This releases fixes assorted bugs, is smaller overall, and should run faster. It can now successfully handle my one-element parser breaking document at this URL. As far as I know, it's the first shipping parser to get this one right (though I haven't tested MSXML on this document yet).


Sun has released StarOffice 7.0, an $80 payware office suite for Windows, Linux, and Solaris based on the open source OpenOffice that uses XML as its native format. StarOffice includes a word processor, spreadsheet, presentation program, draweing program, and a database. It's available in English, French, German, Italian, Spanish, Swedish, Simplified Chinese, Traditional Chinese, Japanese, and Korean. New features in version 7 include better filters for importing and exporting Microsoft Office documents, Flash and PDF export, and fonts for Asian languages such as Japanese and Chinese. Only electronic downloads are available at the moment. Boxed copies will be on store shelves in a few weeks.

Monday, October 20, 2003

Topologi has released version 1.1.6 of the Collaborative Markup Editor, its XML/SGML text editor. CME provides features for marking up raw and non-well-formed text. This release "breaks the million-line file barrier." CME costs $59.95, and runs on Windows, Linux and Mac OS X.

Sunday, October 19, 2003

Mono 0.28, open source implementation of the .NET Development Framework, has been released. New XML features in this release include:

  • A complete Web Services stack with much more WSDL support
  • genxs, a tool that can generate a XML serializer for any type and customize it.
  • Partial XML Schema validation support in XmlValidatingReader
  • Much faster XSLT and XPath

Mono is available for Unix and Windows, at least until such time as Microsoft decides to kill it with patent suits. :-(


Dare Obasanjo has released version 1.0 of EXSLT.NET, an implementation of EXSLT extensions to XSLT for the .NET platform. EXSLT.NET implements the Dates and Times, Common, Math, Regular Expressions, Sets and Strings EXSLT modules.

Saturday, October 18, 2003

Syntext has posted the third beta of Serna, an XSL-based WYSIWYG XML Document Editor for Windows and Linux. This beta is now feature complete, including spell checking. Registration is required for the beta. Pricing has not yet been announced.

Friday, October 17, 2003

The W3C Resource Description Framework (RDF) Core Working Group has published six revised working drafts:

RDF Primer
According to the introduction, RDF "is particularly intended for representing metadata about Web resources, such as the title, author, and modification date of a Web page, copyright and licensing information about a Web document, or the availability schedule for some shared resource. However, by generalizing the concept of a 'Web resource', RDF can also be used to represent information about things that can be identified on the Web, even when they can't be directly retrieved on the Web. Examples include information about items available from on-line shopping facilities (e.g., information about specifications, prices, and availability), or the description of a Web user's preferences for information delivery. RDF is intended for situations in which this information needs to be processed by applications, rather than being only displayed to people. RDF provides a common framework for expressing this information so it can be exchanged between applications without loss of meaning. Since it is a common framework, application designers can leverage the availability of common RDF parsers and processing tools. The ability to exchange information between different applications means that the information may be made available to applications other than those for which it was originally created."
Resource Description Framework (RDF): Concepts and Abstract Syntax
"This document defines an abstract syntax on which RDF is based, and which serves to link its concrete syntax to its formal semantics. This abstract syntax is quite distinct from XML's tree-based infoset [XML-INFOSET]. It also includes discussion of design goals, key concepts, datatyping, character normalization and handling of URI references."
RDF Semantics
"This is a specification of a precise semantics, and corresponding complete systems of inference rules, for the Resource Description Framework (RDF) and RDF Schema (RDFS)."
RDF Vocabulary Description Language 1.0: RDF Schema
"This specification describes how to use RDF to describe RDF vocabularies. This specification defines a vocabulary for this purpose and defines other built-in RDF vocabulary initially specified in the RDF Model and Syntax Specification. "
RDF/XML Syntax Specification (Revised)
"This document defines an XML syntax for RDF called RDF/XML in terms of Namespaces in XML, the XML Information Set and XML Base. The formal grammar for the syntax is annotated with actions generating triples of the RDF graph as defined in RDF Concepts and Abstract Syntax. The triples are written using the N-Triples RDF graph serializing format which enables more precise recording of the mapping in a machine processable form. The mappings are recorded as tests cases, gathered and published in RDF Test Cases."
RDF Test Cases
This document describes a set of machine-processable test cases for RDF though it does not contain the test cases themselves which are available separately.

Changes since the last draft are releatively minor. They include forbidding the use of control characters in RDF URI references and weakening the requirement for Unicode Normalization Form C from a MUST to a SHOULD. Comments on all six are due by November 7.

Thursday, October 16, 2003

The W3C HTML Working Group has released the final recommendation of XML Events, a module that "provides XML languages with the ability to uniformly integrate event listeners and associated event handlers with Document Object Model (DOM) Level 2 event interfaces [DOM2EVENTS]. The result is to provide an interoperable way of associating behaviors with document-level markup."


The W3C has released XForms 1.0. According to the abstract,

XForms is an XML application that represents the next generation of forms for the Web. By splitting traditional XHTML forms into three parts—XForms model, instance data, and user interface—it separates presentation from content, allows reuse, gives strong typing—reducing the number of round-trips to the server, as well as offering device independence and a reduced need for scripting.

XForms is not a free-standing document type, but is intended to be integrated into other markup languages, such as XHTML or SVG.

The XForms Working Group has also posted a candidate recommendation of XForms 1.0 Basic Profile. "The XForms Basic Profile describes a minimal level of XForms processing tailored to the needs of constrained devices and environments."


Brian Quinlan has posted version 0.8.1 of Pyana, a Python interface to the Xalan C XSLT processor. This is a bug fix release.


RenderX has released version 3.6.2 of XEP, its payware XSL Formatting Objects to PDF and PostScript converter. Version 3.6.2 is a bug fix release. The basic client is $299.95. The developer edition with an API is $999.95. The server version is $4999.95. Updates from 3.0 are free.

Wednesday, October 15, 2003

The Mozilla Project has released the Mozilla 1.5, the open source web browser that supports XML, CSS, XSLT, XUL, HTML, XHTML, MathML, SVG, and lots of other crunchy XML goodness. It's available for Linux, Mac OS X, and Windows. Besides bug fixes, improvements in 1.5 include,

  • A spellchecker for MailNews and Composer
  • MailNews can print the attachments list
  • An overhauled ChatZilla
  • MailNews users can add header lines to every message sent out via a certain identity.
  • Users can now mark message as read by date in MailNews.
  • Gecko supports setting color for <HR> and <BR>
  • View source displays line and column numbers in the status bar.
  • Unstyled XML display has been improved.

The Mozilla Project has also posted Firebird 0.7, a standalone web browser based on the same code as Mozilla that does not include chat, e-mail, news reader, or a kitchen. New features in 0.7 include automatic downloads and web panels.


Oracle has released several new versions of their XML Development Kits on a web site that's incompatible with about half of the browsers I tested. First off there's a beta of 10.1.0.1.0A for Windows and Solaris. New features in this release include:

  • XSLT Virtual Machine
  • XSLT 2.0
  • JAXB Class Generator
  • DOM 3.0 Load and Save, and DOM 3.0 Validation support
  • SAX Streaming XML Schema Validation Interfaces
  • JAXP 1.2 support including W3C XML schema language validation
  • New C++ XML APIs that provide unified DOM support for XMLType
  • A New XMLSAXSerializer provides support to handle the SAX output serialization

In addition they've posted a bug fix release in the 9.2 series, version 9.2.0.6.0. I'm not sure what bugs they've fixed. Certainly none of the ones that prevent XOM from using their parser. This release is still flunking 39 separate unit tests. The 10.0 beta does a somewhat better job. It only flunks eight tests, mostly involving incorrectly resolving relative URIs. For instance, it cannot parse this URL. (In fairness, neither can most other parsers, including libxml and all versions of Xerces prior to 2.6.) The parser also doesn't handle ignorable white space according to the SAX spec. These XDKs are not open source. Registration is required.

Tuesday, October 14, 2003

David K. Levine is working on Jex, a WYSIWYG equation editorfor Open Office 1.1 that can output both TeX and MathML. Java 1.4 is required. Jex is in the public domain.

Monday, October 13, 2003

Adobe has released version 3.0.1 of their SVG browser plug-in for Windows 98 and later. This release fixes a number of security holes ion the plug-in. All users are advised to upgrade.


Alexandre Brilliant has released JXMLPad 1.9.5, a €90 shareware JavaBean component for editing XML. Java 1.2 or later is required.

Sunday, October 12, 2003

Dennis Sosnoski has posted the third beta of JiBX, yet another open source framework for binding XML data to Java objects using your own class structures. It falls into the custom-binding document camp as opposed to the schema driven binding frameworks like JaxMe and JAXB. I haven't explored JiBX in depth, but in general I do find the APIs based on custom binding documents to be more flexible and potentially useful than those based on a schema. Quoting from the JiBX web site,

JiBX is a framework for binding XML data to Java objects. It lets you work with data from XML documents using your own class structures. The JiBX framework handles all the details of converting your data to and from XML based on your instructions. JiBX is designed to perform the translation between internal data structures and XML with very high efficiency, but still allows you a high degree of control over the translation process.

How does it manage this? JiBX uses binding definition documents to define the rules for how your Java objects are converted to or from XML (the binding). At some point after you've compiled your source code into class files you execute the first part of the JiBX framework, the binding compiler. This compiler enhances binary class files produced by the Java compiler, adding code to handle converting instances of the classes to or from XML. After running the binding compiler you can continue the normal steps you take in assembling your application (such as building jar files, etc.).

The second part of the JiBX framework is the binding runtime. The enhanced class files generated by the binding compiler use this runtime component both for actually building objects from an XML input document (called unmarshalling, in data binding terms) and for generating an XML output document from objects (called marshalling). The runtime uses a parser implementing the XMLPull API for handling input documents, but is otherwise self-contained.

This beta mostly fixes bugs and improves the documentation.

Saturday, October 11, 2003

Dennis Sosnoski has posted XML Binary Infoset Encoding (XBIS) 0.9, a Java class library for converting XML documents into non-XML binary documents that he claims are faster to parse and generate be factors of four to eight times. However, I don't believe those numbers for a minute. Sosnoski's benchmarks are comparing apples to oranges. His SAX numbers use parsers that perform well-formedness checking, and the time required to check well-formedness are included in the input time. However, the XBIS parser relies on an underlying SAX parse to check well-formedness before the XBIS input data is generated; but the XBIS numbers do not include the time spent on this parsing including well-formedness checking. In fact, I would not be at all surprised that if the well-formedness checking cost were also included in the XBIS numbers, that it would prove slower than real XML parsing. There's also some question about how well optimized and representative the SAX output code is. XBIS is published under a BSD license if you want to check it out for yourself.

Friday, October 10, 2003

The W3C Internationalization Working Group has posted the first public working draft of Authoring Techniques for XHTML & HTML Internationalization 1.0. "The document provides practical techniques that HTML content authors can use to ensure that their HTML is easily adaptable for an international audience. These are techniques that need to be addressed from the start of content development if unnecessary costs and resource issues are to be avoided later on. They are aimed at the developer as well as the localizer."

Thursday, October 9, 2003

Logilab has posted XMLdiff 0.64, an open source (GPL) Python "tool that figures out the differences between two similar XML files, in the same way the diff utility does it for text files." It can be used as a library or as a command line tool. The library can operate on either XML files or DOM trees. This is at least the third tool I've necountered with the name "xmldiff". I think it's time to come up with some new names.


Dave Malcolm has posted Conglomerate 0.7.5, an open source GUI XML editor for Linux written in C, and based on libxml2 and the GTk+ and Gnome libraries. This release adds unlimited undo/redo, a Czech localization, and various small user interface improvements and bug fixes. Conglomerate is published under the GPL.

Wednesday, October 8, 2003

Toni Uusitalo's posted Parsifal 0.7.2, a minimal, non-validating XML parser written in ANSI C. The API is based on SAX2. This release fixes assorted bugs and cleans up the code. Parsifal is in the public domain.

Tuesday, October 7, 2003

Bill Venners has posted the final installment of his interview with me about XML and XOM at last year's Software Development West conference. In this piece, I discuss how to handle invalid documents.


Sam Tregar has released XML::Validator::Schema 1.04, a Perl module that validates XML documents against a partial subset of the W3C XML schema language. It is implemented as a SAX filter on top of XML::SAX. 1.04 adds support for minOccurs and maxOccurs with sequence, choice, and all groups.

Monday, October 6, 2003

The W3C Quality Assurance (QA) Activity has posted a revised candidate recommendation of QA Framework: Operational Guidelines. This document "presents operational and procedural guidelines for groups undertaking conformance materials development." It's not immediately obvious what change necessitated this release 10 days after the last version.

Sunday, October 5, 2003

The W3C HTML Working Group has released the second Last Call Working Draft of Modularization of XHTML in XML Schema. This spec provides a complete set of W3C XML Schema Language modules for XHTML, and allows document authors to modify and extend XHTML to build new, non-strictly conmforming XHTML documents. Comments are due by November 14.

Saturday, October 4, 2003

Syntext has posted the second beta of Serna, an XSL-based WYSIWYG XML Document Editor for Windows and Linux. Registration is required for the beta. Pricing has not yet been announced.


Sam Tregar has released XML::Validator::Schema 1.03, a Perl module that validates XML documents against a partial subset of the W3C XML schema language. It is implemented as a SAX filter on top of XML::SAX.


Toni Uusitalo has posted Parsifal 0.7.2, a minimal, non-validating XML parser written in ANSI C. The API is based on SAX2. This release makes various internal changes that may improve the speed somewhat. Parsifal is in the public domain.


Dave Malcolm has posted Conglomerate 0.7.4, an open source GUI XML editor for Linux written in C, and based on libxml2 and the GTk+ and Gnome libraries. Conglomerate is published under the GPL.


IBM has updated their XML Parser for Java to version 4.2.2. This release is based on Xerces-J 2.4.0 and supports the W3C XML Schema Recommendation 1.0, SAX 1.0 and 2.0, DOM Level 1, DOM Level 2, and some experimental features of DOM Level 3 Core and Load/Save Working Drafts, JAXP 1.2, and XNI. 4.2.2 is a bug fix release.

Friday, October 3, 2003

I've fixed a bug in the Permalink code. It should be working again.


The Apache Project has released Cocoon 2.1.2, an open source "web development framework built around the concepts of separation of concerns and component-based web development. Cocoon implements these concepts around the notion of 'component pipelines', each component on the pipeline specializing on a particular operation. This makes it possible to use a Lego(tm)-like approach in building web solutions, hooking together components into pipelines without any required programming." Cocoon can assemble data from many sources including filesystems, SQL databases, LDAP, native XML databases, and SAP. It can customize the output to generate HTML, WML, PDF, SVG, and RTF from the same inputs. Processes it supports include XSL transformation and XInclude resolution. Cocoon can run as a servlet inside an existing web server or standalone through a commandline interface.


Michael Kay has released Saxon 7.7, an experimental open source implementation of large parts of XSLT 2.0 and XPath 2.0 in Java. Version 7.7 brings the syntax into sync with the XQuery/XPath 2.0/XSLT 2.0 working drafts of August 22 (with a few features that look forward to the next draft). It also adds support for the xsl:character-map and xsl:output-character elements. Finally it fixes numerous bugs. Java 1.4 is required. Saxon is published under the Mozilla Public License 1.0.


ActiveState has posted the first beta of Visual XSLT 2.0, a $295 payware XSLT development plug-in for Visual Studio .NET. It includes an XSLT editor, XSLT debugger, template browser, and more. Version 2.0 adds a just-in-time debugger and a visual schema mapper. The Beta is free, and expires on November 1, 2003.


Chiba 0.92, a partial open source implementation of XForms, has been posted. Chiba is designed to run on the server side without using any client side scripting capabilities. Chiba is written in Java and published under the artistic license.

Thursday, October 2, 2003

Andy Clark has posted a new release of his CyberNeko Tools for the Xerces Native Interface (NekoXNI) that fixes a number of bugs, especially in the HTML parser. Other tools in the package include a generic XML pull parser, a RELAX NG validator, and a DTD to XML converter.

Wednesday, October 1, 2003

The OpenOffice Project has officially released OpenOffice 1.1, an open source office suite for Linux and Windows that saves all its files as zipped XML. I used the previous 1.0 version to write Effective XML. This is the same as release candidate five from a couple of days ago. They've just marked it final. New features since version 1.0 include:

  • PDF export
  • DocBook support
  • Much prettier fonts
  • XHTML support
  • Macro recorder
  • Support for vertical scripts like traditional Chinese and bidirectional scripts like Hebrew and Arabic
Tuesday, September 30, 2003

Norm Walsh has released version 1.62.4 of his XSLT stylesheets for DocBook. This release makes numerous small changes and bug fixes including better support for HTML tables and Latin. Nice news for the massive community of tech writers working in Latin. :-)

Monday, September 29, 2003

Amazon, Barnes & Noble have finally gotten Effective XML in stock for immediate shipment. My apologies to everyone who preordered after my initial announcement. I had no idea it was going to take so long for the copies to get from the publisher to the bookstore warehouses. Other bookstores should have it soon.


The W3C Resource Description Framework (RDF) Core Working Group has published six revised working drafts:

RDF Primer
According to the introduction, RDF "is particularly intended for representing metadata about Web resources, such as the title, author, and modification date of a Web page, copyright and licensing information about a Web document, or the availability schedule for some shared resource. However, by generalizing the concept of a 'Web resource', RDF can also be used to represent information about things that can be identified on the Web, even when they can't be directly retrieved on the Web. Examples include information about items available from on-line shopping facilities (e.g., information about specifications, prices, and availability), or the description of a Web user's preferences for information delivery. RDF is intended for situations in which this information needs to be processed by applications, rather than being only displayed to people. RDF provides a common framework for expressing this information so it can be exchanged between applications without loss of meaning. Since it is a common framework, application designers can leverage the availability of common RDF parsers and processing tools. The ability to exchange information between different applications means that the information may be made available to applications other than those for which it was originally created."
Resource Description Framework (RDF): Concepts and Abstract Syntax
"This document defines an abstract syntax on which RDF is based, and which serves to link its concrete syntax to its formal semantics. This abstract syntax is quite distinct from XML's tree-based infoset [XML-INFOSET]. It also includes discussion of design goals, key concepts, datatyping, character normalization and handling of URI references."
RDF Semantics
"This is a specification of a precise semantics, and corresponding complete systems of inference rules, for the Resource Description Framework (RDF) and RDF Schema (RDFS)."
RDF Vocabulary Description Language 1.0: RDF Schema
"This specification describes how to use RDF to describe RDF vocabularies. This specification defines a vocabulary for this purpose and defines other built-in RDF vocabulary initially specified in the RDF Model and Syntax Specification. "
RDF/XML Syntax Specification (Revised)
"This document defines an XML syntax for RDF called RDF/XML in terms of Namespaces in XML, the XML Information Set and XML Base. The formal grammar for the syntax is annotated with actions generating triples of the RDF graph as defined in RDF Concepts and Abstract Syntax. The triples are written using the N-Triples RDF graph serializing format which enables more precise recording of the mapping in a machine processable form. The mappings are recorded as tests cases, gathered and published in RDF Test Cases."
RDF Test Cases
This document describes a set of machine-processable test cases for RDF though it does not contain the test cases themselves which are available separately.

Comments on all six are due by February 21.

Sunday, September 28, 2003

Norm Walsh has posted the third beta of DocBook 4.3, the XML application for technical documentation that I used to write Processing XML with Java. The changes are mostly minor and all backwards compatible. They include a syntax attribute on the synopsis and programlisting elements, xml:base support, and bug fixes.


The Mozilla Project has posted the second release candidate of Mozilla 1.5, the open source web browser that supports XML, CSS, XSLT, XUL, HTML, XHTML, MathML, SVG, and lots of other crunchy XML goodness. It's available for Linux, Mac OS X, and Windows. Besides bug fixes, improvements in 1.5 include,

  • Unstyled XML display has been improved.
  • A spellchecker for MailNews and Composer.
  • MailNews can print the attachments list.
  • An overhauled ChatZilla
  • MailNews users can add header lines to every message sent out via a certain identity.
  • Users can now mark message as read by date in MailNews.
  • Gecko supports setting color for <HR> and <BR>.
  • View source displays line and column numbers in the status bar.

Alexandre Brillant has released JXP 1.3.1, a €50 shareware XPath 1.0 API that can be customized to fit different object models. This release adds cache support for the navigator, allows DOM Documents to be used intead of Elements, and fixes bugs.

Saturday, September 27, 2003

XML Benchmark 1.2.1 is a C/C++/Java toolset for benchmarking XML parsers including libxml2, Xerces, Oracle XDK, Expat, RXP, QT, and Crimson. Benchmarks include parsing (native, SAX, DOM), DOM manipulation, schema validation, XSL transformation, and XML signature and encryption. Version 1.2.1 adds support for XML Security 1.1.x.

I've learned to treat benchmarks with a 20-pound bag of rock salt until proven otherwise. However, this product gets at least one thing right. It lets you plug in "Any valid XML file" so you can test parsers on the kind of documents you're interested in rather than on whatever the benchmark vendor has. Most parsers exhibit wildly varying performance characteristics depending on the type of XML document (large or small, record-like or narrative, many attributes or few attributes, etc.). It's not clear whether or not this parser can test well-formed but invalid documents.


Alexandre Brillant has released FastParser 1.6.2, a $50 shareware, non-validating, SAX parser for Java. Version 1.6.1 fixes various bugs and makes some optimizations

Brillant claims this parser is faster than Xerces and Crimson (which are known not to be the fastest parsers out there). However, his benchmarks only test one file, and it's not clear from his result whether FastParser was used in a mode that doesn't perform full well-formedness checking.


Friday, September 26, 2003

Pekka Enberg has posted version 0.2.11 of XML Indent, an open source (GPL) "XML stream reformatter written in ANSI C" that "is analogous to GNU indent." This is a bug fix release.


Opera Software has released version 7.20 of its namesake web browser for Windows. Opera is $39 payware or free-beer adware. Opera supports XML, CSS, HTML, and various other acronyms. New features in version 7.20 include bidirectional text layout for Hebrew and Arabic and the data URL protocol for Base64 encoded content. Many small bugs are fixed as well. On the minus side, version 7.20 adds support for the marquee and blink tags, which can only make one wonder what kind of crack is being smoked in Norway these days. Most of the world recognized these tags were bad ideas in 1995.

Thursday, September 25, 2003

The OpenOffice Project has posted the fifth release candidate of OpenOffice 1.1, an open source office suite for Linux and Windows that saves all its files as zipped XML. I used the previous 1.0 version to write Effective XML (which should be arriving in stores any day now, really. I apologize to anybody who ordered it earlier. I got my first copy last week, and I don't know why the stores don't have it yet. I am checking with my publisher. :-( ) According to the announcement, "The build includes bug fixes and is speedier and more robust. No new features have been introduced since RC4. Just bug fixes. We thank you for helping us find and fix them; the work the user community has done has been invaluable. And you should continue with that work. RC5 is very close to finished but it really needs for people to test it and find any flaws that may exist. Please download it, use it, and if you come across something, file a bug report "


Uche Ogbuji has posted Anobind 0.6.0, an open source data binding API for Python. Anobind converts an XML document into a data structure of corresponding Python objects. Anobind is driven by a declarative rules that describe how the XML is bound to the Python objects. Anobind published under the 4Suite variant of the Apache license. It requires Python 2.2.2 and 4Suite 1.0a3.


Alexandre Brilliant has released JXMLPad 1.9.4, a €90 shareware JavaBean component for editing XML. Java 1.2 or later is required.

Wednesday, September 24, 2003

James Clark has launched a the relaxng-user mailing list for discussing the RELAX NG schema language. To subscribe to the list, send an e-mail to relaxng-user-subscribe@relaxng.org.


RenderX has released version 3.6.1 of XEP, its payware XSL Formatting Objects to PDF and PostScript converter. Version 3.6.1 fixes a bug with SVG input. The basic client is $299.95. The developer edition with an API is $999.95. The server version is $4999.95. Updates from 3.0 are free.

Tuesday, September 23, 2003

CubeWerx has released CWXML, an open-source C-language library for parsing and generating XML, but really what they're interested in is BXML, yet another binary format that pollutes the XML brand. Guys, what you've created is not XML. Please don't call it that.

A report by Dr. Craig S. Bruce makes the usual batch of claims about the non-XML binary format being smaller and faster to parse than real XML. As usual, the actual data tested is way too small to make any real assertions about whether this is true. Unusually, they did include a large narrative document as one of their two test cases. Most people offering systems like this ignore that completely. Regardless though, even ignoring the very limited number of test cases (2), the numbers in their report don't support their claims. They demonstrate that the size of gzipped XML is smaller than their binary format, and that when you gzip their format the size difference between the two is trivially small. They make some bigger claims for speed, but the parse times they're working with are so small to begin with, they really don't matter. Reducing parsing from 0.044 seconds to 0.005 seconds may be an order of magnitude speed up, but it's hardly a significant gross savings.

As usual in this field, it appears the researchers have rigged the game by assuming a more homogeneous and thus more easily optimized world than XML actually supports. They only support Latin-1, and it looks probable that they aren't performing all the checks an XML parser is required to perform on characters. They also make the common mistake of encoding numbers as binary for speed. It's possible to make this work fast for one platform, but every optimization you perform for that platform exacts a comparable slowdown on other platforms with incompatible binary numeric formats.

One thing I'm beginning to think is necessary in this field is a large collection of standard test cases that use the full panoply of XML: DTDs, entity references, CDATA sections, attributes, elements, narrative content, record-like data, recursive data, namespaces, multiple encodings, and more. This benchmark set would be useful not just for testing binary formats, but also for testing parser performance. All too often benchmarks in this field are based on just one or two documents. It's rare to see benchmarks that cover a broad and deep enough collection of documents to justify themselves.

Monday, September 22, 2003

I'm experimenting with permalinks for the daily news on this site. I've written an XSLT stylesheet that extracts the news and stores it into a daily file. Currently a cron job regenerates the data every hour, which means the link can be broken or out of date for as much as 59 minutes. I really need a better way to automatically recognize when the main index file has changed and only then regenerate the news extract without manual intervention. This is all running on Linux. Anyone have an idea?


I've posted XOM 1.0d21, my tree-based streaming API for processing XML with Java. This release collects a large number of small fixes and improvements: more standard XInclude results, exceptions that behave properly in Java 1.4, cycle detection when building documents, non-final copy methods, a checkDetach() method, and more. There are API level changes in this release but most of them are backwards compatible, and the ones that aren't are in fairly obscure areas of the API. Almost all 1.0d20 code should run just fine in 1.0d21 with maybe just a recompile. Details on the web page.

Sunday, September 21, 2003

Oleg Tkachenko has released XInclude.NET, an open source implementation of the XInclude 1.0 Candidate Recommendation written in C# for the .NET platform. This supports the XPointer framework, shorthand pointers, element(), xmlns() and xpath1() schemes. The .NET Framework 1.0 is required.


Dave Malcolm has posted Conglomerate 0.7.2, an open source GUI XML editor for Linux written in C, and based on libxml2 and the GTk+ and Gnome libraries. Conglomerate is published under the GPL.


The W3C Timed Text Working Group has posted Timed Text (TT) Authoring Format 1.0 Use Cases and Requirements. "A timed text authoring format is a content type that represents timed text media for the purpose of interchange among authoring systems. Timed text is textual information that is intrinsically or extrinsically associated with timing information." For example, this could be used for subtitles or closed captioning.

Saturday, September 20, 2003

Version 2.0.4 of the payware <Oxygen/> XML editor has been released. Oxygen supports XML, XSL, DTDs, and the W3C XML Schema Language. New features in version 2.0.4 include:

  • Editing remote files via FTP
  • Better presentation for errors and find results.
  • Link to specification for XML Schema errors.
  • Improve the validate support for Relax NG schemas.
  • Content completion driven by Relax NG Schemas.
  • Replace string accepts new lines.
  • Improved console support.
  • Configurable editor background color.
  • XML perspective and XML project support for Eclipse.
  • Shortcuts for actions in Eclipse plugin.
  • Indent/unindent selection in Eclipse plugin.
  • Escape selection in Eclipse plugin.
  • Import from HTML available in Eclipse.

Oxygen requires Java 1.3 or later. It costs $74.

Friday, September 19, 2003

Today is the official release date for Effective XML. Later today it will be arriving at fine computer book stores around the country. Amazon, BookPool, and Barnes & Noble don't show it in stock yet, but that should change soon. Amazon in particular generally sells out of their initial shipments of my books very quickly, so you may wish to preorder it. The list price is $44.99, but most stores are offering their usual discounts. Amazon is currently 30% off. BookPool is running a special on Addison-Wesley book right now and has it at 45% off, only $24.95. That's the lowest price I've found, and it won't last so order now.

For new visitors to this site who haven't heard of this before, Effective XML is a collection of guidelines and best practices for using XML. It focuses on using and developing XML applications, with a particular emphasis on aspects of XML that are often misunderstood or misapplied. It follows the path blazed by books like Scott Meyers' Effective C++ and Joshua Bloch's Effective Java.

Learning the fundamentals of XML might take a programmer a week. Learning how to use XML effectively might take a lifetime. While many books have been written that teach developers how to use the basic syntax of XML, this is the first one that really focuses on how to use XML well. This book is not a tutorial. It is not going to teach you what a tag is or how to write a DTD. I assume you know these things. Instead it's going to tell you when, why, where, and how to use such tools effectively (and perhaps equally importantly when not to use them).

This book derives directly from my own experiences teaching and writing about XML. Over the last five years. I've written several books and taught numerous introductory courses about XML syntax, APIs, and tools. Increasingly I'm finding that audiences are already familiar with the basics of XML. They know what a tag is, how to validate a document against a DTD, and how to transform a document with an XSLT style sheet. The question of what XML is and why to use it has been sufficiently well evangelized. The essential syntax and rules are reasonably well understood. However, although most developers know what a CDATA section is, they are not sure what to use one for. Although programmers know how to add attribute and child nodes to elements, they are not certain which one to use when.

Since XML has become a fundamental underpinning of new software systems, it becomes important to begin asking new questions, not just what XML is, but how does one use it effectively? Which techniques work and which don't? Perhaps most importantly, which techniques appear to work at first but fail to scale as systems are further developed? When I teach programming at my university, one of the first things I tell my students is that it is not enough to write programs that compile and produce the expected results. It is as important (perhaps more important) to write code that is extensible, legible, and maintainable. XML can be used to produce robust, extensible, maintainable, comprehensible systems or it can be used to create masses of unmaintainable, illegible, fragile, closed code. In the immortal words of Eric Clapton, “It's in the way that you use it.”

XML is not a programming language. It is a markup language; but it is being successfully used by many programmers. There have been markup languages before, but in the developer community XML is far and away the most successful. However, the newness and unfamiliarity of markup languages have meant that many developers are using it less effectively than they could. Many programmers are hacking together systems that work, but are not as robust, extensible, or portable as XML promises. This is to be expected. Programmers working with XML are pioneers exploring new territory, opening up new vistas in software, and accomplishing things that could not easily be accomplished just a few years ago. However one definition of a pioneer is someone with an arrow in their back, and more than a few XML pioneers have returned from the frontier with arrows in their backs.

Five years after the initial release of XML into the world, certain patterns and antipatterns for the proper design of XML applications are becoming apparent. All of us in the XML community have made mistakes while exploring this new territory, the author of this book prominently among them. However, we've learned from those mistakes, and we're beginning to develop some principles that may help those who follow in our footsteps to avoid making the same mistakes we did. It is time to put up some caution signs in the road. We may not exactly say “Here there be dragons”, but we can at least say, “That road is a lot rockier than it looks at first glance, and you might really want to take this slightly less obvious but much smoother path off to the left.”

If you'd like to know more, the preface and ten chapters are posted in their entirety on the book's web page. The title is Effective XML; the ISBN number is 0-321-15040-6; the list price is $44.99; and you can order it from fine book stores everywhere including Amazon, Barnes & Noble , and Powell's. Enjoy and Happy XML!

Thursday, September 18, 2003

Mikhail Grushinskiy has posted XMLStarlet 0.6, an open souruce collection of command line programs written in C and based on libxml and libxslt that "can be used to transform, query, validate, and edit XML documents and files using simple set of shell commands in similar way it is done for plain text files using UNIX grep, sed, awk, diff, patch, join, etc commands." Features include

  • Check or validate XML files (simple well-formedness check, DTD, XSD, Relax NG)
  • Calculate values of XPath expressions on XML files
  • Search XML files for matches to given XPath expressions
  • Apply XSLT stylesheets to XML documents
  • Modify or edit XML documents
  • Format or pretty print XML documents
  • Browse tree structure of XML documents (in similar way to 'ls' command for directories)
  • Resolve XIncludes
  • Canonicalize documents
  • Escape/unescape special XML characters in input text
  • Print directory as XML document
  • Convert XML into PYX and PYX into XML

Version 0.6 now uses libxml2 2.5.11 and libxslt 1.0.33, adds binaries for Windows, Solaris 8 (SPARC), and Linux, and fixes some bugs.


Corel has released the Corel SVG Viewer 2.1, a browser plug-in for Windows NT 4 and later that enables Internet Explorer and Netscape Navigator to display Scalable vector Graphics pictures.


Meanwhile, Adobe has posted a pre-alpha of version 6.0 of their SVG browser plug-in for Windows 98 and later. It supports parts of SVG 1.2, including some new video features. It also lets SVG code be embedded directly in the HTML. Only Internet Explorer is supported.


Genicorp has released OODoc, an open source Perl module for reading, writing, and modfiying with OpenOffice documents. OODoc provides access to the content, styles, and metadata in an OpenOffice document. (The site is in French, but the API is in English. Scroll to the bottom of the page for "une notice introductive en Anglais".) OODoc is dual licensed under the GPL and LGPL.


Sun Microsystems has released StarOffice 7, a $79.95 payware office suite for Winodws, Solaris, and Linux (based on the open source OpenOffice) that saves its files as gzipped XML. New features in this release include PDF, Flash, DocBook, and XHTML export, a macro recorder, bidirectional text support for Hebrew and Arabic, imporved Microsoft Office compatibility, MySQL integration, and XSLT transformations for output.

Wednesday, September 17, 2003

RenderX has released version 3.6 of XEP, its payware XSL Formatting Objects to PDF and PostScript converter. Version 3.6 improves SVG and embedded PostScript print handling support. It also adds support for linearized PDF files that display faster in web browsers (though still much slower than plain vanilla HTML.) The basic client is $299.95. The developer edition with an API is $999.95. The server version is $4999.95. Updates from 3.0 are free.


The W3C RDF Core Working Group has posted a note describing LBase: Semantics for Languages of the Semantic Web. This addresses the formalization of RDF and RDF-based languages. The abstract states,

This document presents a framework for specifying the semantics for the languages of the Semantic Web. Some of these languages (notably RDF [RDF-PRIMER] [RDF-VOCABULARY] [RDF-SYNTAX] [RDF-CONCEPTS] [RDF-SEMANTICS], and OWL [OWL]) are currently in various stages of development and we expect others to be developed in the future. This framework is intended to provide a framework for specifying the semantics of all of these languages in a uniform and coherent way. The strategy is to translate the various languages into a common 'base' language thereby providing them with a single coherent model theory.

We describe a mechanism for providing a precise semantics for the Semantic Web Languages (referred to as SWELs from now on. The purpose of this is to define clearly the consequences and allowed inferences from constructs in these languages.

Tuesday, September 16, 2003

freebxml.org has released ebxmlrr 2.1, an open source ebXML Registry. An XML registry/repository "can store XML, web services, or any other type of data, and the registry manages the entire life cycle of information in the repository using sophisticated meta-data technology." According to the announcement, this implements all the functionality required by version 2.1 of the ebXML Registry, most optional features, annd several new features of the latest interim specifications for ebXML Registry version 3. The client package includes a Java API for XML Registries (JAXR) provider.


BEA has contributed XMLBeans to the Apache XML Project. XMLBeans is an XML-Java binding tool based on the W3C XML Schema Language that also provides access to the full underlying XML Infoset through an XML Cursor API. You can now download it from the Apache CVS server. Binaries and more are coming soon.


The W3C Cascading Style Sheets Working Group has posted the last call working draft of Cascading Style Sheets, level 2 revision 1, that is, CSS 2.1. Unusually for a new spec, this goes backwards from the previous version. It focuses on removing properties from CSS2 rather than adding them. The impetus for removal is the failure of browser vendors to implement them. Features removed include:

  • font-stretch
  • font-size-adjust
  • Aural style sheets
  • text-shadow

In addition, CSS 2.1 adds support for media-specific style sheets, content positioning, table layout, internationalization and some properties related to user interface. It also "corrects a few errors in CSS2 (the most important being a new definition of the height/width of absolutely positioned elements, more influence for HTML's 'style' attribute and a new calculation of the "clip" property)."

Comments are due by October 10.


Monday, September 15, 2003

Daniel Veillard has released version 2.5.11 of libxml2, the XML C library for Gnome. This releases fixes a few bugs in RELAX NG validation and multithreading.


The W3C Quality Assurance (QA) Activity has posted three revised specifications on quality assurance:

These describe "a common framework for enhancing the quality practices of the W3C Working Groups in the areas of specification editing, production of test materials, and coordination efforts with internal and external groups." Comments are due by February 27.


The W3C has released version 8.1b of Amaya, their open source testbed web browser and authoring tool for Solaris, Linux, Windows, and Mac OS X that supports HTML, XHTML, XML, CSS, MathML, and SVG. This is a bug fix release.

Sunday, September 14, 2003

The W3C CSS working group has posted the second public working draft of CSS3 module: Paged media. According to the abstract,

TThis module describes the page model that partitions a flow into pages. It builds on the CSS3 Box model module and introduces and defines the page model and paged media. It adds functionality for pagination, page margins, headers and footers, footnotes and endnotes. Finally it extends generated content for the purpose of cross-references with page numbers.

New features since CSS2 include page-based floats, page-based counters, and named pages.

Saturday, September 13, 2003

The W3C XML Activity has posted a Proposal for XML Fragment Identifier Syntax 0.9 based on the work of the now defunct XLink working group. In brief this proposes that fragment identifiers (the part after the # in a URL) for XML documents be defined as the XML media type define its fragment identifier syntax by the XPointer Framework and XPointer Element() Scheme. Full XPointers would not be required, just ID based pointers and tumblers.


Bob McWhirter has posted the second beta of Jaxen 1.1, an open source XPath library for Java that supports DOM, JDOM, dom4j, and ElectricXML.

Friday, September 12, 2003

Today is the last day to submit talks for Software Development 2004 West. The 2004 show has expanded the number of talks by about a third so there's a better than usual chance of having your talks accepted. This conference will take place March 15-19, 2003, in Santa Clara. Once again I'll be chairing the XML track. However, there are also tracks covering Java, C++, Web Services, .NET. Architecture and Design, Wireless and Mobile computing, and more.


Daniel Veillard has released version 1.0.33 of libxslt, the GNOME XSLT library for C and C++. These releases fix assorted bugs.


James Clark has posted the first alpha of nXML, a new XML mode for Emacs which offers code completion based on Relax NG schemas.

Thursday, September 11, 2003

The OpenOffice Project has posted the fourth release candidate of OpenOffice 1.1, an open source office suite for Linux and Windows that saves all its files as zipped XML. I used the previous 1.0 version to write Effective XML (which should be arriving in stores any day now). According to the announcement, "The build includes bug fixes and is probably speedier and more robust. No new features have been introduced since RC3. Just lots of important fixes."

Wednesday, September 10, 2003

Easypress Technologies has released Atomik Xport Personal Edition 1.0, a $995 payware program for exporting XML from QuarkXPress.

Monday, September 8, 2003

Norm Walsh has released DocBook slides 3.2, a customization layer for producing presentations in DocBook. This release fixes bugs and is now compatible with version 1.62 of the DocBook XSL style sheets.

Friday, September 5, 2003

I have to make an unexpected trip out of town for the next few days. I'll be checking e-mail, but updates will be a little slow here until I return on Wednesday.


Decisionsoft has posted the first alpha of Pathan 2.0, an open source add-on for Xerces-C that can evaluate XPath 2.0 expressions to select DOM nodes. Pathan 2 implements the May 3 working draft of XPath 2.0.

Thursday, September 4, 2003

Inventive Designers has released Scriptura 1.2.2,a €1995 payware WYSIWYG XSL Formatting Objects authoring tool that supports static objects and dynamic data loaded via XML or JDBC. It can produce XSLT, XSL-FO, XHTML, PDF and PCL5.

Wednesday, September 3, 2003

Dave Malcolm has posted Conglomerate 0.7.1, an open source GUI XML editor for Linux written in C, and based on libxml2 and the GTk+ and Gnome libraries. Conglomerate is published under the GPL.


Jérôme Siméon and Mary Fernández at Bell Labs have released Galax 0.31, an open-source implementation of XQuery 1.0.

Tuesday, September 2, 2003

Antenna House, Inc has updated the beta its payware XSL Formatter 3.0 for Linux and Windows. For the first time, a Solaris version is also available. Version 3.0 can format much longer documents, speed is increased, PDF output is now bundled, SVG support has been added, and many more languages can be hyphenated. However, this beta is not yet feature complete with respect to XSL-FO or version 2.5.


Alexandre Brillant has released FastParser 1.6.1, a $50 shareware, non-validating, SAX parser for Java. Version 1.6.1 fixes various bugs.

Brillant claims this parser is faster than Xerces and Crimson (which are known not to be the fastest parsers out there). However, his benchmarks only test one file, and it's not clear from his result whether FastParser was used in a mode that doesn't perform full well-formedness checking.


Monday, September 1, 2003

The KOffice developers have announced plans to adopt the OpenOffice XML file formats, starting with the next major releaser after KOffice 1.3.


Norm Walsh has released version 1.62.0 of his XSLT stylesheets for DocBook. This release adds a locale locale, adds support for HTML tables, recognizes the new uri and orgname elements, enables line numbering in verbatim environments to start from numbers other than 1, and makes numerous other small changes.

Sunday, August 31, 2003

Frederic Laurent has posted Lantern 0.9, an open source GUI XPath query tool written in Java. Lantern is published under the GPL.

Saturday, August 30, 2003

Didier Demany has released xmloperator 2.3, an open source, tree-based XML editor written in Java. Editing can be guided by a RELAX NG schema or a DTD. xmloperator is published under a BSD-like license.

Friday, August 29, 2003

The Mozilla Project has posted the first beta of Mozilla 1.5, the open source web browser that supports XML, CSS, XSLT, XUL, HTML, XHTML, MathML, SVG, and lots of other crunchy XML goodness. It's available for Linux, Mac OS X, and Windows. Besides bug fixes, improvements include,

  • Unstyled XML display has been improved.
  • A spellchecker for MailNews and Composer.
  • MailNews can print the attachments list.
  • An overhauled ChatZilla
  • MailNews users can add header lines to every message sent out via a certain identity.
  • Users can now mark message as read by date in MailNews.
  • Gecko supports setting color for <HR> and <BR>.
  • View source displays line and column numbers in the status bar.

Opera Software has posted a beta version 7.20 of its namesake web browser for Windows. Opera is $39 payware or free-beer adware. Opera supports XML, CSS, HTML, and various other acronyms. New features in version 7.20 include bidirectional text layout for Hebrew and Arabic and the data URL protocol for Base64 encoded content. Many small bugs are fixed as well. On the minus side, version 7.20 adds support for the marquee and blink tags, which can only make one wonder what kind of crack is being smoked in Norway these days. Most of the world recognized these tags were bad ideas in 1995.

Thursday, August 28, 2003

BEA Systems has posted the proposed final draft of Java Specification Request (JSR) 173, Streaming API for XML (StAX), to the Java Community Process. This JSR proposes a Java-based, pull-parsing API for XML. StAX offers two approaches. XMLStreamReader and XMLStreamWriter are a cursor API designed to read and write XML as efficiently as possible. XMLEventReader and XMLEventWriter are an iterator API designed to be easy to use, event based, easy to extend, and allow easy pipelining. The iterator API sits on top of the cursor API.

There is a reference implementation bundled with the spec and JavaDoc. I haven't had time to write code with it yet, or to test the performance; but overall from the spec and JavaDoc I'd say this is the cleanest, most XML conformant pull parser I've seen to date. It's definitely a substantial improvement on XMLPULL. I'm giving the spec a thorough going over right now. So far my comments are mostly minor editorial issues. A few more major points:

  • The API relies on integer type codes rather than classes to type the different kinds of events. This means programs are littered with big switch statements to find out what kind of node they've got. Many methods apply to some nodes but not others. This is a very ugly, un-object oriented design. NekoPull is the example of the right way to solve this problem.
  • The entity resolver doesn't support public IDs.
  • Whether or not CDATA sections are reported depends on their location. They should never be reported differently than any other text.

But that's it. Most of my other comments were even more minor and editorial. Overall, this looks like a very nice API.

Wednesday, August 27, 2003

Version 1.7.1 of Jaxe, an open source (GPL) XML GUI editor written in Java 1.3, has been released. It is configurable with an XML schema and a configuration file, supports validation at element insertion, is customisable via Java modules, and can use XSLT to display documents as XHTML. A little unusually, the configuration files are written in French--ENCODAGE instead of ENCODING, BALISE instead of TAG, etc. The user interface has been localized into French, English, and German. Version 1.7.1 adds an equation editor and a new table element.


Mikhail Grushinskiy has posted XMLStarlet 0.5.1, an open souruce collection of command line programs written in C and based on libxml and libxslt that "can be used to transform, query, validate, and edit XML documents and files using simple set of shell commands in similar way it is done for plain text files using UNIX grep, sed, awk, diff, patch, join, etc commands." Features include

  • Check or validate XML files (simple well-formedness check, DTD, XSD, Relax NG)
  • Calculate values of XPath expressions on XML files
  • Search XML files for matches to given XPath expressions
  • Apply XSLT stylesheets to XML documents
  • Modify or edit XML documents
  • Format or pretty print XML documents
  • Browse tree structure of XML documents (in similar way to 'ls' command for directories)
  • Resolve XIncludes
  • Canonicalize documents
  • Escape/unescape special XML characters in input text
  • Print directory as XML document
  • Convert XML into PYX and PYX into XML
Tuesday, August 26, 2003

The W3C Device Independence Working Group has published the first working draft of Glossary of Terms for Device Independence. This defines a number of terms such as authored unit, browser, client, HTTP server, harmonized user experience, perceived unit, and so forth.


Bill Venners has posted the penultimate installment of his interview with me about XOM and API design at Software Development 2002 West last spring, Designing by Dictatorship, Examples, and Tests.


Xavier Franc has posted Qizx/Open 0.1, an open source impementation of XQuery, written in Java. It conforms to XQuery Basic with Static Type Checking.

Monday, August 25, 2003

I've posted version 1.0d20 of XOM, my open source streaming/tree API for processing XML with Java. This release works around some Java bugs in handling file URLs on Windows that prevented the Builder.build(File) method from resolving relative URLs used in system identifiers. All unit tests now pass on Windows as well Unix. In addition the JAR file is shipped uncompressed for faster class loading.


The W3C Multimodal Interaction working group has posted the first public working draft of EMMA: Extensible MultiModal Annotation markup language. According to the abstract, "this document is part of a set of specifications for multi-modal systems, and provides details of an XML markup language for describing the interpretation of user input. Examples of interpretation of user input are a transcription into words of a raw signal, for instance derived from a speech or pen input, a set of attribute/value pairs describing their meaning, or a set of attribute/value pairs describing a gesture. The interpretation of the user's input is expected to be generated by signal interpretation processes, such as speech and ink recognition, semantic interpreters, and other types of processors for use by components that act on the user's inputs such as interaction managers."


The W3C Internationalization Working Group has published a new working draft of Character Model for the World Wide Web 1.0. "This Architectural Specification provides authors of specifications, software developers, and content developers with a common reference for interoperable text manipulation on the World Wide Web. Topics addressed include character encoding identification, early uniform normalization, string identity matching, string indexing, and URI conventions, building on the Universal Character Set, defined jointly by the Unicode Standard and ISO/IEC 10646. Some introductory material on characters and character encoding is also provided." This draft updates the wider world on the progress of the working group. At first glance most of the changes, since last year's draft seem to editorial, not substantive. A little unusually, the working group states, "We do not encourage comments on this Working Draft; instead we ask reviewers to wait for being informed about our disposition of their comments, or for Canditate [sic] Recommendation in case of new comments."


Toni Uusitalo's posted Parsifal 0.7.1, a minimal, non-validating XML parser written in ANSI C. The API is based on SAX2. This release fixes assorted bugs. Parsifal is in the public domain.


Peter J. Jones has posted xmlwrapp 0.4.3, a C++ library for working with XML built on top of Daniel Veillard's libxml2. This release fixes bugs. xmlwrapp is published under a BSD license.


Alexandre Brilliant has released JXMLPad 1.9.2, a €90 shareware JavaBean component for editing XML. Java 1.2 or later is required.


Brendan Macmillan has posted version 2.0.9.8 of Java Serialization for XML (JSX) 2, a library for converting Java objects into streams of XML and reading the objects back from the streams. To use it, replace ObjectOutputStream with JSX.ObjectWriter and ObjectInputStream with JSX.ObjectReader. This release improves compatibility with several virtual machines including Sun's J2ME 1.0a and 1.0.1 and IBM's JDK 1.3.1.

Sunday, August 24, 2003

Norm Walsh has posted the second beta of DocBook 4.3, the XML application for technical documentation that I used to write Processing XML with Java. The changes are mostly minor and all backwards compatible. They include a PDF notation, a namespace attribute on the sgmltag element, a language attribute for verbatim environments, HTML tables, bidirectional text override, a uri element, and various other minor additions to existing elements.


Steve Cheng has posted docbook2X 0.8.2, an open source package for Unix that converts DocBook files to to man pages and Texinfo.

Saturday, August 23, 2003

The W3C XQuery and XSLT Working Groups have deposited another batch of working drafts into the world:

XML Path Language (XPath) 2.0
"This version contains a new section entitled "Processing Model" that provides a more complete and detailed description of expression processing. It also contains specific error codes for various error conditions, and a glossary in which many terms are defined. The section on Optional Features has been rewritten. The term Basic XPath is no longer used. Changes have been made in the details of certain kinds of expressions." The input() function has been deleted.
XPath Requirements 2.0
"This document is the first revision of the [XPath 2.0 Requirements] working draft. This revision includes, for each requirement, a corresponding status, indicating the current situation of the requirement in the XPath 2.0 family of specifications. A future revision will be provided when all remaining open issues have been resolved and when the XPath 2.0 documents are issued as Last Call working drafts."
XQuery 1.0: An XML Query Language
"This version contains a new section entitled "Processing Model" that provides a more complete and detailed description of expression processing. It also contains specific error codes for various error conditions, and a glossary in which many terms are defined. The section on Optional Features has been rewritten. The term Basic XQuery is no longer used. A new optional feature called the Full Axis Feature (supporting all the XPath axes except namespace) has been added. Three new types of computed constructors are introduced, and the syntax for declaring various objects in module prologs has changed. Changes have been made in the details of certain kinds of expressions."
XQuery 1.0 and XPath 2.0 Formal Semantics

This version reflects the most recent semantics of [XPath/XQuery]. Among the most important changes from the previous version of this document are:

  • Implementation of the semantics of namespaces in [XPath/XQuery]. This closes Issue 443 (FS-Issue-0100: Namespace resolution) and Issue 508 (FS-Issue-0165: Namespaces in element constructors).
  • A simplified static semantics for path expressions. This closes Issue 475 (FS-Issue-0132: Typing for descendant), Issue 527 (Static typing of XPath index expressions), and Issue 560 (Exactness of Type Inference).
XQuery Use Cases
The example syntax has been updated in sync with the new XQuery 1.0/XPath 2.0 working drafts, and a few errors have been fixed. No use cases have been added or removed.
Friday, August 22, 2003

Intellidimension posted the second beta of InferEd, a Resource Description Framework (RDF) editor that supports inference. Features include:

  • n-triples and n3 files (in addition to rdf+xml)
  • acts as URIQA client)
  • reification
  • Multi-selection of statements and resources
  • Exports to http and ftp URLs and to RDF Gateway tables
  • Editing of XMP data in Adobe PDF docs

And if you understand what any of that means, you iknow more about RDF than I do. Pricing has not yet been announced.


Antenna House, Inc has posted a beta of version 3.0 of its payware XSL Formatter for Linux and Windows. This release can format much longer documents, speed is increased, PDF output is now bundled, SVG support has been added, and many more languages can be hyphenated. However, this beta is not yet feature complete with respect to XSL-FO or version 2.5.


The Writer's Forge has posted Bellows 0.20, a class library for processing XML with Java based on hash maps and lists. The key class is Datum which represents an XML element. Datums have maps of properties, namespaces, and lists of children. One interesting idea I haven't seen in previous APIs is that properties are inherited from the nearest ancestor-or-self with the given property. This would be useful in some cases (Tnink of determing the current value of xml:lang or xml:space) but I'm not sure it's always applicable. Bellows includes a custom query language inspired by XPath.


The W3C Web Services Architecture Working Group has updated the working drafts of Web Services Architecture and Web Services Glossary.

Web services provide a standard means of interoperating between different software applications, running on a variety of platforms and/or frameworks. This document (WSA) is intended to provide a common definition of a Web service, and define its place within a larger Web services framework to guide the community.

The WSA provides a model and a context for understanding Web services and the relationships between the various specifications and technologies that comprise the WSA. The WSA promotes interoperability through the definition of compatible protocols. The architecture does not attempt to specify how Web services are implemented, and imposes no restriction on how services might be combined. The WSA describes both the minimal characteristics that are common to all Web services, and a number of characteristics that are needed by many, but not all, Web services.

The WSA integrates different conceptions of Web services under a common "reference architecture". There isn't always a simple one to one correspondence between the architecture of the Web and the architecture of existing SOAP-based Web services, but there is a substantial overlap.

We offer a framework for the future evolution of Web services standards that will promote a healthy mix of interoperability and innovation. That framework must accommodate the edge cases of pure SOAP-RPC at one side and HTTP manipulation of business document resources at the other side, but focus on the area in the middle where the different architectural styles are both taken into consideration.

Thursday, August 21, 2003

The W3C Multimodal Interaction Working Group has published the first public working draft of the Ink Markup Language. According to the abstract,

The Ink Markup Language serves as the data format for representing ink entered with an electronic pen or stylus. The markup allows for the input and processing of handwriting, gestures, sketches, music and other notational languages in Web-based applications. It provides a common format for the exchange of ink data between components such as handwriting and gesture recognizers, signature verifiers, and other ink-aware modules.

The following example of writing the word "hello" in InkML is given in the spec:

<ink>
  <trace>
    10 0 9 14 8 28 7 42 6 56 6 70 8 84 8 98 8 112 9 126 10 140
    13 154 14 168 17 182 18 188 23 174 30 160 38 147 49 135
    58 124 72 121 77 135 80 149 82 163 84 177 87 191 93 205
  </trace>
  <trace>
    130 155 144 159 158 160 170 154 179 143 179 129 166 125
    152 128 140 136 131 149 126 163 124 177 128 190 137 200
    150 208 163 210 178 208 192 201 205 192 214 180
  </trace>
  <trace>
    227 50 226 64 225 78 227 92 228 106 228 120 229 134
    230 148 234 162 235 176 238 190 241 204
  </trace>
  <trace>
    282 45 281 59 284 73 285 87 287 101 288 115 290 129
    291 143 294 157 294 171 294 185 296 199 300 213
  </trace>
  <trace>
    366 130 359 143 354 157 349 171 352 185 359 197
    371 204 385 205 398 202 408 191 413 177 413 163
    405 150 392 143 378 141 365 150
  </trace>
</ink>

<sarcasm>Gee, that's not the least bit opaque.</sarcasm>. This looks like the SVG mistake all over again. Thr right way to solve this problem is something like this:

<ink>
  <trace>
    <coordinate><x>10</x> <y>0</y></coordinate>
    <coordinate><x>9</x> <y>14</y></coordinate>
    <coordinate><x>8</x> <y>28</y></coordinate>
    <coordinate><x>7</x> <y>42</y></coordinate>
    <coordinate><x>6</x> <y>56</y></coordinate>
    <coordinate><x>6</x> <y>70</y></coordinate>
    <coordinate><x>8</x> <y>84</y></coordinate>
    <coordinate><x>8</x> <y>98</y></coordinate>
    <coordinate><x>8</x> <y>112</y></coordinate>
    <coordinate><x>9</x> <y>26</y></coordinate>
    <coordinate><x>10</x> <y>140</y></coordinate>
    <coordinate><x>13</x> <y>154</y></coordinate>
    <coordinate><x>14</x> <y>168</y></coordinate>
    <coordinate><x>17</x> <y>182</y></coordinate>
    <coordinate><x>18</x> <y>188</y></coordinate>
    <coordinate><x>23</x> <y>174</y></coordinate>
    <coordinate><x>30 </x> <y>60</y></coordinate>
    <coordinate><x>38</x> <y>147</y></coordinate>
    <coordinate><x>49</x> <y>135</y></coordinate>
    <coordinate><x>58</x> <y>124</y></coordinate>
    <coordinate><x>72 </x> <y>21</y></coordinate>
    <coordinate><x>77</x> <y>135</y></coordinate>
    <coordinate><x>80</x> <y>149</y></coordinate>
    <coordinate><x>82</x> <y>163</y></coordinate>
    <coordinate><x>84</x> <y>177</y></coordinate>
    <coordinate><x>87</x> <y>191</y></coordinate>
    <coordinate><x>93</x> <y>205</y></coordinate>
  </trace>
</ink>

That's more verbose, but it's also much clearer. It would let the data be extracted with standard XML tools rather than requiring each user to write their own micro-parser for the trace elements. If InkML really can't afford to actually markup the x and y coordinates as x and y coordinates instead of raw text, then one wonders why it's using XML at all?


The W3C Web Services Choreography Working Group has posted the first public working draft of Web Services Choreography Requirements 1.0. According to the abstract, "As the momentum around Web Services grows, the need for effective mechanisms to co-ordinate the interaction among Web Services and their users becomes more pressing. The Web Services Choreography Working Group has been tasked with the development of such a mechanism in an interoperable way. This document describes a set of requirements for Web Services choreography based around a set of representative use cases, as well as general requirements for interaction among Web Services. This document is intended to be consistent with other efforts within the W3C Web Services Activity."


The W3C has released Amaya 8.1a, their open source testbed web browser and authoring tool for Solaris, Linux, and Windows that supports HTML, XHTML, XML, CSS, MathML, and SVG. This is a bug fix release.


Syntext has released Dtd2Xs 1.3, a tool for converting complex, modularized XML DTDs to W3C XML Schema Language schemas. Dtd2Xs runs on Windows and Linux. Dtd2Xs is $49 on Windows, $39 on Linux, and free for non-commercial use. This is a bug fix release.


Opera Software has released version 6.03 of its namesake web browser for the Macintosh. This release is optimized for mac OS X 10.3. OS version back to 8.6 are also supported, (which is a good thing since Mac OS 10.3 hasn't actually shipped yet.) Opera supports XML, CSS, HTML, and various other acronyms. Opera is $39 payware or free-beer adware.

Wednesday, August 20, 2003

The W3C Web Ontology Working Group has issued candidate recommendations of all six of its specifications:

Quoting from the overview document,

The OWL Web Ontology Language is designed for use by applications that need to process the content of information instead of just presenting information to humans. OWL facilitates greater machine interpretability of Web content than that supported by XML, RDF, and RDF Schema (RDF-S) by providing additional vocabulary along with a formal semantics. OWL has three increasingly-expressive sublanguages: OWL Lite, OWL DL, and OWL Full.

Comments on all these are due by September 20

Tuesday, August 19, 2003

Dennis Sosnoski has posted the second beta of JiBX, yet another open source framework for binding XML data to Java objects using your own class structures. It falls into the custom-binding document camp as opposed to the schema driven binding frameworks like JaxMe and JAXB. I haven't looked at JiBX in detail, but in general I do find the APIs based on custom binding documents to be a lot more flexible and potentially useful than those based on a schema. Quoting from the JiBX web site,

JiBX is a framework for binding XML data to Java objects. It lets you work with data from XML documents using your own class structures. The JiBX framework handles all the details of converting your data to and from XML based on your instructions. JiBX is designed to perform the translation between internal data structures and XML with very high efficiency, but still allows you a high degree of control over the translation process.

How does it manage this? JiBX uses binding definition documents to define the rules for how your Java objects are converted to or from XML (the binding). At some point after you've compiled your source code into class files you execute the first part of the JiBX framework, the binding compiler. This compiler enhances binary class files produced by the Java compiler, adding code to handle converting instances of the classes to or from XML. After running the binding compiler you can continue the normal steps you take in assembling your application (such as building jar files, etc.).

The second part of the JiBX framework is the binding runtime. The enhanced class files generated by the binding compiler use this runtime component both for actually building objects from an XML input document (called unmarshalling, in data binding terms) and for generating an XML output document from objects (called marshalling). The runtime uses a parser implementing the XMLPull API for handling input documents, but is otherwise self-contained.


Morphon has released the Morphon XML Editor 3.1.2, an XML editor that provides WYSIWYG, source, and tree views. This release adds fallback grammars, fixes printing problems, and includes a RenderPDF plugin based on FOP. Morphon is free-beer. Registration is required.

Monday, August 18, 2003

CMP has posted the Call for Papers for Software Development 2004 West. This conference will take place March 15-19, 2003, in Santa Clara. Once again I'll be chairing the XML track. XML-wise this conference tends to focus on more practical, how-to sessions rather than a typical XML conference, which runs more advanced and theoretical sessions.

We're looking for sessions that speak to programmers who are not necessarily XML experts but need to learn how to use SAX, DOM, namespaces, XSLT, schemas, etc. We're not as focused on bleeding edge topics and research projects as the more XML-specific shows. We're looking for ninety minute seminars, and half-day and full-day tutorials. If you haven't presented at this show before, you're much more likely to get picked for one or two ninety minute sessions than for a half or full-day session. We like to get to know speakers with a smaller session first. Submissions are due by September 5.


The W3C DOM Working Group has posted the Candidate Recommendation of DOM Level 3 Validation. Changes since the last draft include

  • validateDocument now returns a short to indicate the result rather than throwing an exception.
  • The named constants in NodeEDitVal are now as follows:
        // validationType
        public static final short VAL_WF                    = 1;
        public static final short VAL_NS_WF                 = 2;
        public static final short VAL_INCOMPLETE            = 3;
        public static final short VAL_SCHEMA                = 4;
    
        // validationState
        public static final short VAL_TRUE                  = 5;
        public static final short VAL_FALSE                 = 6;
        public static final short VAL_UNKNOWN               = 7;
  • ElementEditVAL has the following new members:
        public static final short VAL_EMPTY_CONTENTTYPE     = 1;
        public static final short VAL_ANY_CONTENTTYPE       = 2;
        public static final short VAL_MIXED_CONTENTTYPE     = 3;
        public static final short VAL_ELEMENTS_CONTENTTYPE  = 4;
        public static final short VAL_SIMPLE_CONTENTTYPE    = 5;
    
        public NameList getAllowedChildren();
        public NameList getAllowedFirstChildren();
        public short getContentType();
        public short canSetTextContent(String possibleTextContent);
  • The RangeVAL interface has been deleted.

The W3C Scalable Vector Graphics (SVG) Working Group has released a new version of the SVG test suite, covering more of SVG 1.1 and mobile SVG. The test results mention a prerelease version 6 of the Adobe SVG viewer, which appears to be quite conformant.

Sunday, August 17, 2003

Daniel Veillard has released version 2.5.10 of libxml2, the XML C library for Gnome and version 1.0.32 of libxslt, the GNOME XSLT library for C and C++. These releases fix assorted bugs and make a few speed ups.


Geral Bauer has posted baet 8 of Luxilla, an open source "'runtime' that turns Luxor XUL markup into live windows, dialogs, menus, toolbars and more without requiring a single-line of Java code. Pass on the chrome folder holding your XUL markup to Luxilla and see the XUL markup come to life." Luxilla is published under the GPL.


The OpenOffice Project has posted the third release candidate of OpenOffice 1.1, an open source office suite for Linux and Windows that saves all its files as zipped XML. I used the previous 1.0 version to write Effective XML. The main focus of this release is improved support for Mac OS X. However, Mac OS X binaries do not appear to be available.

Saturday, August 16, 2003

The W3C XML Core working group has published a working draft of xml:id Requirements. The basic goal here is to define a general purpose attribute which can be used as an ID in the absence of DTD or schema.


The W3C HTML Activity has published the proposed recommendation of XML Events, a module that "provides XML languages with the ability to uniformly integrate event listeners and associated event handlers with Document Object Model (DOM) Level 2 event interfaces [DOM2EVENTS]. The result is to provide an interoperable way of associating behaviors with document-level markup."

Friday, August 15, 2003

Version 2.0.3 of the payware <Oxygen/> XML editor has been released. Oxygen supports XML, XSL, DTDs, and the W3C XML Schema Language. New features in version 2.0.3 include:

  • Eclipse plug-in
  • Batch validation
  • Regular expression groups support in search/replace
  • Configurable menu shortcut keys
  • Enhanced syntax coloring

Oxygen requires Java 1.3 or later. It costs $74.

Thursday, August 14, 2003

The Apache Project has released Cocoon 2.1, an open source "web development framework built around the concepts of separation of concerns and component-based web development. Cocoon implements these concepts around the notion of 'component pipelines', each component on the pipeline specializing on a particular operation. This makes it possible to use a Lego(tm)-like approach in building web solutions, hooking together components into pipelines without any required programming." Cocoon can assemble data from many sources including filesystems, SQL databases, LDAP, native XML databases, and SAP. It can customize the output to generate HTML, WML, PDF, SVG, and RTF from the same inputs. Processes it supports include XSL transformation and XInclude resolution. Cocoon can run as a servlet inside an existing web server or standalone through a commandline interface.


The W3C CSS working group has posted the first working draft of CSS3 module: Syntax. According to the abstract,

This CSS3 module describes the basic structure of CSS style sheets, some of the details of the syntax, and the rules for parsing CSS style sheets. It also describes (in some cases, informatively) how stylesheets can be linked to documents and how those links can be media-dependent. Additional details of the syntax of some parts of CSS described in other modules will be described in those modules. The selectors module has a grammar for selectors. Modules that define properties give the grammar for the values of those properties, in a format described in this document.

New features since CSS2 include support for namespaces and the ability to support custom CSS properties without conflicting with the standard properties.


The W3C CSS working group has also posted the first working draft of CSS3 module: Presentation levels. This defines a single new property, presentation-level. According to the abstract,

Presentation levels are integer values attached to elements in a document. Elements that are below, at, or above a certain threshold can be styled differently. This feature has two compelling use cases. First, slide presentations with transition effects can be described. For example, list items can be progressively revealed by sliding in from the side. Second, outline views of documents, where only the headings to a certain level are visible, can be generated.

Finally, the CSS working group has posted a working draft of CSS Print Profile. Its abstract states:

This specification defines a subset of the Cascading Style Sheets Level 2 [CSS2] and specification with additions from the proposed features of CSS3 module: Paged MediaPaged Media Properties for CSS3 [PAGEMEDIA], specifically for printing to low-cost devices. It is designed for printing from mobile devices, where it is not feasible or desirable to install a printer-specific driver, and for situations were some variability between the device's view of the document and the formatting of the output is acceptable to provide a strong basis for rich printing results without a detailed understanding of each individual printer's characteristics.

Syntext has posted a beta of Serna, an XSL-based WYSIWYG XML Document Editor for Windows and Linux. Registration is required for the beta. Pricing has not yet been announced.

Wednesday, August 13, 2003

Makoto Yui has posted XpSQL 0.9, an open source XML database environment that sits on top of PostgreSQL. XpSQL supports DOM and XPath. The project states "XpSQL is suitable for use of a Data-centric XML."


Wolfgang Meier of the Darmstadt University of Technology has posted version 0.9.2 of eXist, an open source native XML database that supports fulltext search. XML can be stored in either the internal, native XML-DB or an external relational database. The search engine has been designed to provide fast XPath queries, using indexes for all element, text and attribute nodes. The server is accessible through HTTP and XML-RPC interfaces and supports the XML:DB API for Java programming. Besides bug fixes, new features in version 0.9.2 include XUpdate support. eXist is published under the LGPL.


RenderX has released version 3.5.4 of XEP, its payware XSL Formatting Objects to PDF and PostScript converter. Version 3.5.4 is a bug fix release. The basic client is $299.95. The developer edition with an API is $999.95. The server version is $4999.95. Updates from 3.0 are free.

Tuesday, August 12, 2003

Bill Venners has posted two more installments of his interview with me about XOM and API design at Software Development 2002 West last spring:


Mark Baker has posted the working draft of Atom 0.2, the proposed XML syndication format that's a successor to RSS. Atom was formerly known as Pie and Echo. The biggest new feature in this release is using MIME multipart/alternative messages, which is beginning to move it away from XML as the basic envelope.


The XML Apache Project has released Xalan-C++ 1.6, an open source XSLT processor written in standard C++. Version 1.6 fixes bugs, includes ports for NetBSD and FreeBSD, and speeds up the UTF-8 and UTF-16 serializers.


The XML Apache Project has also released XML Security C++ 1.0, a C++ library that implements XML digital signatures. This release supports Xerces 2.2 and 2.3 and Xalan 1.6.


The printed book is due out next month, but the Unicode Consortium has posted the complete book in PDF format on its website. Unicode 4.0 adds 1,226 new characters including "currency symbols, additional Latin and Cyrillic characters, the Limbu and Tai Le scripts; Yijing Hexagram symbols, Khmer symbols, Linear B syllables and ideograms, Cypriot, Ugaritic, and a new block of variation selectors (especially for future CJK variants). Double diacritic characters were added for dictionary use."


In related news, the Unicode Consortium has posted the first beta of Unicode 4.0.1 (the data files, not the book). "This is the first update of Unihan.txt since Unicode 3.2, and it includes a large number of corrections and additions. There are several other minor changes to other data files." Comments are due by August 18, 2003.


The W3C Technical Architecture Group (TAG) has posted a new editor's draft of Architecture of the World Wide Web. According to the abstract,

The World Wide Web is an information system that relates information sources and services through the use of hypertext-style relationships, creating a web of information that spans the Internet. Architecture defines the desired operational behavior of components within this information system, including software, machine, and human components, and protocols for interactions between components. The Web architecture is influenced by social requirements and software engineering principles, leading to design choices that constrain the behavior of the Web in order for the system to achieve desired properties: an efficient, scalable, shared information space that will continue to grow indefinitely across languages, cultures, and information mediums. This document is organized to reflect the three dimensions of Web architecture: identification, interaction, and representation.


Cladonia Ltd.has released the Exchanger XML Editor 1.0, a $98 payware Java-based XML Editor. Features include

  • Schema Based Editing
  • Tag Prompting
  • Validation against DTD, XML Schema, RelaxNG
  • Tree View and Outliner for Tag Free editing
  • XPath and Regular expression searches
  • Schema Conversion
  • XSLT
  • Project Management
  • SVG Viewer and Conversion
  • Easy SOAP Invocations
  • Find in Files
  • Extension Handling
Monday, August 11, 2003

The W3C Math Working Group has published the proposed edited recommendation of the second edition of MathML 2.0. "The preparation of a Second Edition of the MathML 2.0 Specification allows the revision of that document to provide a coherent whole containing corrections to all the known errata and clarifications of some smaller issues that proved problematic. It is not the occasion for any fundamental changes in the language MathML 2.0."


Sablotron 1.0, an open source XML processor for C++ has been released. Sablotron supports XSLT 1.0, XPath 1.0, DOM Level 2, and some extension functions from EXSLT. Sablotron is dual licensed under the the Mozilla Public License 1.1 and the GNU General Public License (GPL). It should run on most modern Windows and Unixes.


Alexandre Brillant has released JXP 1.3, a €50 shareware XPath 1.0 API that can be customized to fit different object models. This release fixes assorted bugs and optimizes a few functions.


Pekka Enberg has posted version 0.2.9 of XML Indent, an open source (GPL) "XML stream reformatter written in ANSI C" that "is analogous to GNU indent." This is a bug fix release.

Sunday, August 10, 2003

Version 1.6.2 of Axkit, the Perl-based XML Application Server Framework for Apache, has been released. AxKit converts XML to other formats such as HTML, WAP and text on the fly using either W3C standard techniques like XSLT and XInclude or custom code. This release fixes bugs adds a few small features including a support for HTTP HEAD requests.


Andy Clark has posted a new release of his CyberNeko Tools for the Xerces Native Interface (NekoXNI). New features in this release include:

  • Scanning of the doctype declaration in the HTML parser
  • Non-normalized attribute values in the HTML parser for XNI filters
  • Style processor settings to control validation and namespace processing
  • Ability to pass in pipeline to the style processor from standard input
Saturday, August 9, 2003

The W3C XForms Working Group has posted the proposed recommendation of XForms 1.0. According to the abstract,

XForms is an XML application that represents the next generation of forms for the Web. By splitting traditional XHTML forms into three parts—XForms model, instance data, and user interface—it separates presentation from content, allows reuse, gives strong typing—reducing the number of round-trips to the server, as well as offering device independence and a reduced need for scripting.

XForms is not a free-standing document type, but is intended to be integrated into other markup languages, such as XHTML or SVG.


Tony Graham has posted xslide 0.2.2, an emacs major mode for XML. New features include:

  • xsl-if-to-choose
  • Support for more XSLT processors
  • Improved XEmacs compatibility
Tuesday, August 5, 2003

The XML Apache Project has released version 2.5 of Xerces-J, the popular open source XML parser for Java. New features in this release include:

  • Annotation support in the XML Schema component model API
  • A preliminary XInclude implementation. XPointer and XML Base are not yet supported.
  • Implemented the latest last call DOM Level 3 Core and Load and Save working drafts
  • Modified PSVIWriter to output all PSVI information as a Xerces Native Interface (XNI) event stream rather than a file.
  • Improved error messages for schema validation
  • IPv6 support

In addition, many bugs were fixed.


Slava Pestov has uploaded the fourth pre-release of jEdit 4.2, an open source programmer's editor written in Java with extensive plug-in support and my preferred text editor on Windows and Unix. New features in this release include improved syntax highlighting for NSIS2, Ruby, and Pike. Regular expressions have been sped up. In addition many bugs were fixed.

Friday, July 25, 2003

Unless there's wireless on the beach, this will be my last update for the next couple of weeks. I'm taking a much needed vacation. I may pop in here occasionally over the next two weeks if any of the hotels I'm staying in have Internet access, but don't count on it. Regular updates should resume August 11.


I've posted version 1.0d19 of XOM, my open source streaming/tree API for processing XML with Java. This release makes a couple of backwards incompatible changes to NodeFactory. makeElement has been renamed startMakingElement and endElement has been renamed finishMakingElement. startMakingElement behaves the same as the old makeElement. However, finishMakingElement now has a slightly different contract. If it returns null, the entire element is deleted from the tree. It is no longer necessary to explicitly call detach. If it returns a different element than the one passed to it, then the old element is deleted from the tree and the new one is inserted in its place. This is more consistent with the other methods in this class. Return the node you want added to the tree, or null for no node at all.

The second big change has no API-level impact. By default, the Serializer class and toXML methods now use numeric character references to to escape all tabs, carriage returns, and line feeds in attribute values and all carriage returns in text nodes. This helps make round tripping more reliable and robust. XOM is published under the LGPL.


In related news, Linux Magazine has posted online Rogers Cadenhead's article about XOM that was published on paper a few months ago. The article covers XOM 1.0d8 so it's a little dated, but there's still some good stuff in there.


Bare Bones Software has released BBEdit 7.0.4. This is a free update for all 7.0 users. BBEdit is the Macintosh text/HTML/XML/programmer's editor I normally use to write this page. This is a bug fix release. Mac OS 9.1 or later is required.

Thursday, July 24, 2003

From the bad ideas that wouldn't die department, it seems that the W3C is planning a "Workshop on Binary Interchange of XML Information Item Sets." It's unclear yet whether non-W3C members are welcome. So far they aren't, but you can find a little information on Marc Hadley's blog. The problems with binary XML (an oxymoron if ever there was one) are addressed in several chapters of Effective XML, most notably Item 50.

Some developers either don't believe or don't get XML's value proposition of a compatible, interoperable, editable, text format. They falsely believe that binary formats are significantly faster or smaller than XML, which is almost never true in practice. It's true some of the time, especially in the embedded space. For instance, I would not suggest encoding a bitmapped image in XML. However, in those cases no XML-like format will work, binary or otherwise. A format that is simpler than XML is required. Binary encoding the infoset simply doesn't help.

Worse yet, some vendors are deliberately trying to lock developers into their patented, closed, binary, "XML" formats so they can sell their tools. The patents probably wouldn't survive through the W3C process, but they still hope to be able to complicate XML enough that programmers will buy their editors and APIs, rather than using simple, free tools like emacs and Xerces like they are now. Text XML is too simple to sell tools for, so they hope that by making it a binary format they can convince programmers to buy their wares. I doubt they'll succeed, but that is the thought process. The binary formats actually already exist, and the market has ignored them with a resounding silence. They have achieved no traction and no interest in the community. I suspect these vendors see W3C standardization as a last ditch effort to convince programmers to buy a technology they don't want and don't need.

The vendors and developers pushing binary XML are a small minority in the community. Unfortunately they're likely to be a huge majority at the workshop and on whatever committee comes out of it. Developers who rightly think that binary XML is an oxymoron and want to kill it won't go to the workshop or join the working group. Like other W3C working groups, the binary XML crowd will be packed with true believers who are deaf to the claims that its work is fundamentally flawed and needs to be rejected in toto. They'll insist that we're not making useful comments because we refuse to accept their mistaken premises. Two years down the line we'll be looking at yet another awful W3C recommendation that confuses user, pollutes the XML space, and makes XML much more complicated for everyone.

This needs to be stopped now. This workshop should be cancelled, or at least not take place under the W3C umbrella. I don't really care if a group of programmers want to get together to talk about binary data formats, as long as they don't call their formats XML. I'm confident that real XML will triumph over binary formats in the many areas where it is the best solution, and fail in the few areas where it's not, as long as the W3C doesn't pollute the XML brand by claiming it's anything other than Unicode in angle brackets. For the same reasons, the SOAP Message Transmission Optimization Mechanism should be pulled. These are toxic technologies that serve no one's interests. They significantly compromise the XML promise of interoperable, interchangeable data that can be processed by a host of free, simple, readily available tools. The W3C should stand firm that XML is text and nothing but. There are no alternate serializations for the XML Infoset besides Unicode characters in angle brackets.

Wednesday, July 23, 2003

The Mozilla Project has posted the first alpha of Mozilla 1.5, the open source web browser that supports XML, CSS, XSLT, XUL, HTML, XHTML, MathML, SVG, and lots of other crunchy XML goodness. It's available for Linux, Mac OS X, and Windows. The most important new features are numerous little enhancements to Composer. They've also added logging to Chatzilla, and various other minor improvements.

Tuesday, July 22, 2003

The W3C XML Protocol Working Group has posted the first public working draft of the SOAP Message Transmission Optimization Mechanism. This spec "describes an abstract feature for optimizing the transmission and/or wire format of a SOAP message by selectively re-encoding portions of the message, while still presenting an XML Infoset to the SOAP application." The whole package would be sent in a MIME multipart envelope. Parts of the data could be sent in binary rather than text. This strikes me as very dangerous. It's not clear who's in favor of this, but the editors come from IBM, BEA and Canon. I can smell the lock-in already.

Monday, July 21, 2003

Gal Binyamini has posted JXV 0.4, an open source library that allows Java objects to be given "XML Views", and for those views to be read back into objects. (This strikes me as a little more plausible than the other direction in which you start with an XML document and build a custom Java object around it.) Essentially, this is another variation of object serialization using XML. JXV supports SAX input and output and DOM output. According to Binyamini,

JXV uses a pluggable architecture which allows XML view factories to be configured and loaded at runtime. The JXV configuration mechanisms also leverage XML namespaces to allow the configurations for those different view factories to be inlined within the JXV configuration file. In this release, JXV comes pre-configured with view factories for JavaBeans, collections, array, and "flat objects" such as Strings, primitives, etc. These factories support a wide variety of configuration options, and are sufficient for most object models. Future versions of JXV will include pre-configured support for additional factories. JXV may also release special-purpose factories (such as ones providing views for RowSets, ResultSets and other JDBC structures) as extension packages.

This release improves ease of use.

Sunday, July 20, 2003

Sun's posted the second proposed final draft specification for Java Specification Request (JSR) 172, J2ME Web Services Specification, in the Java Community Process (JCP). This basically describes subset of JAXP, JAX-RPC, and XML intended for talking to SOAP services from Java 2 Micro Edition devices. "The goal of this optional package is to define a strict subset wherever possible of the XML parsing functionality defined in JSR-063 JAXP 1.2 [2] that can be used on the Java 2 Micro Edition Platform (J2ME)". On first reading, this draft appears to be substantially more XML-conformant than previous drafts.

The only major problems I noticed were in the SAX subset. Sun is using the confusing, underspecified SAXParser and SAXParserFactory instead of the much cleaner, better specified XMLReader and XMLReaderFactory. They've also removed ContentHandler completely and replaced it with DefaultHandler. Since this requires altering many signature in JAXP, this makes it substantially more difficult to port standard SAX programs to J2ME.


In related news, IBM's alphaworks has updated its Web Services Tool Kit for Mobile Devices to support the latest draft of this specification. This tookit also supports WCE and SMF environments (whatever those are). However, it only supports a subset of SOAP 1.1.

Saturday, July 19, 2003

The W3C Scalable Vector Graphics Working Group has posted new working drafts of Scalable Vector Graphics (SVG) 1.2 and SVG Print. Both drafts are still quite incomplete.


The Apache XML Project has posted version 0.20.5 of FOP, the open source XSL Formatting Objects to PDF/PCL/PostScript/AWT converter. New features in this release include PDF encryption, support for CCITT Group 4 encoded TIFF files, border spacing, and Dynamic JAI support. In addition many bugs were fixed and performance was improved.


IBM's alphaWorks has released the XML Schema Quality Checker 2.2. This program reads "an XML Schema written in the W3C XML schema language and diagnoses improper uses of the schema language." This release features changes from the June 1 XML Schema 1.0 Specification Errata and Java 1.4 support.

Friday, July 18, 2003

The W3C has posted an initial draft collection of semi-standard character entity declarations for use with XML. This is based on MathML 2 and the ISO/IEC TR 9573 entity sets. According to David Carlisle, "It is hoped that this _draft_ set of definitions might form the basis of a shared, compatible set of definitions between different XML languages so that the current situation where <mo> &assymp; </mo> changes meaning if it is copied from a docbook+mathml document to a xhtml+mathml document might be avoided..."


Stefan Champailler's posted DTDDoc 0.0.9, a JavaDoc like tool for creating HTML documentation of document type definitions from embedded DTD comments. This release makes it easier to comment the attributes in the ATTLIST declarations and fixes some bugs in the style sheets. DTDDoc is published under the GPL.

Thursday, July 17, 2003

The Big Faceless Organization has released the Big Faceless Report Generator 1.1.8, $1200 payware a Java application for converting XML documents to PDF. Unlike most similar tools it appears to be based on HTML and CSS rather than XSL Formatting Objects. Java 1.2 or later is required.


Version 1.5.1 of Jaxe, an open source (GPL) XML GUI editor written in Java 1.3, has been released. It is configurable with an XML schema and a configuration file, supports validation at element insertion, is customisable via Java modules, and can use XSLT to display documents as XHTML. A little unusually, the configuration files are written in French--ENCODAGE instead of ENCODING, BALISE instead of TAG, etc. The user interface has been localized into French, English, and German. Version 1.5.1 adds spell checking.


Jochen Wiedmann has released JaxMeXS 1.02, a parser for XML Schema written in Java. This is focused on type-annotation rather than validation. JaxMeXS is published under a revised BSD license.

Wiedmann has also released JaxMe 1.64, "yet another open source Java/XML binding tool in the style of Castor or Zeus" that sits on top of SAX2. JaxMe provides code generators that read a W3C XML schema and generate code for parsing conformant XML documents into corresponding Java objects, saving those objects into a database or reading those Java objects from a database and converting them into XML. JaxMe supports SQL databases and Tamino. It includes an integrated application framework and a generator for EJB entity beans with bean managed persistence (BMP). It's based on a reduced subset of the W3C XML schema language that does not support choices, references, or recursion. This release fixes bugs.

Wednesday, July 16, 2003

AOL is jettisoning Mozilla. Apparently they decided it was easier and more profitable to collude with Microsoft than compete. It has laid off most of the Mozilla developers. However, the open source Mozilla will continue, now under the auspices of the newly formed Mozilla Foundation, which will rehire several of the key developers. That's one of the big benefits of open source. Software isn't tied to the whims of one company.

The big question mark now is what happens now to Netscape, which depends on the Mozilla code base and still has about 500 employees and a recognized brand. AOL gave $2,000,000 to the Mozilla Foundation as part of the unofficial severance package, so maybe they still recognize some value there. I've heard rumors that AOL is in the process of removing the Netscape logo from various buildings. Anyone in Mountain View want to confirm or deny?


Tomas Styblo has released Mozex 1.00, a Mozilla extension for Unix/Linux that allows the user to choose external helper programs for these actions:

  • View source
  • Edit content of textareas
  • Mailto, news, telnet and FTP links
  • Download files

Mozex is dual licensed under the Mozilla Public License and the GNU General Public License

Tuesday, July 15, 2003

The OpenOffice Project has posted the first release candidate of OpenOffice 1.1, an open source office suite for Linux and Windows that saves all its files as zipped XML. I used the previous 1.0 version to write Effective XML. New features since the previous beta include a "talkback" style crash reporter, support for drawing objects in headers and footers, an example XSLT filter for Office 2003 XML format, support for Microsoft Excel 95 (and older) form controls, a UNO python bridge that makes Python a first class language for creating UNO components, spell checking dictionaries for UK English and Italian, hyphenation for Danish, UK, English, German, and Russian, Bitstream Vera fonts, and improved spelling suggestions using n-gram scoring (whatever that is). The weakness of the spell checker was one of my major complaints about OpenOffice 1.0. I'm much looking forward to this. New features in 1.1 since 1.0 include import and export of PDF, Macromedia Flash, DocBook, several PDA Office file formats, flat XML and XHTML and complex text layout for languages such as Thai, Hindi, Arabic, and Hebrew.


The Jakarta Apache Project has released the Element Construction Set 1.4.2, a Java class library for generating markup code. HTML 4.0 and XML support is bundled, and other languages can be added. ECS code looks something like this HTML example from the ECS site:

Document doc = (Document) new Document()
         .appendTitle("Demo")
         .appendBody(new H1("Demo Header"))
         .appendBody(new H3("Sub Header:"))
         .appendBody(new Font().setSize("+1")
                .setColor(HtmlColor.WHITE)
                .setFace("Times")
                .addElement("The big dog & the little cat chased each other."));
System.out.println(doc.toString());
Monday, July 14, 2003

I've posted one more chapter from Effective XML today because it has some direct bearing on the ongoing development of Echo, which seems likely to replace RSS as a syndication format for the Web. Chapter 13, Remember Mixed Content, discusses the current escaped HTML format used by RSS feeds, and explains part of the reason it's a very bad idea. (The previously posted items 14 and 15 are also relevant.) In brief, escaping HTML was just an ugly hack to work around the many deficiencies of RSS 0.9. There is no reason to carry this nasty hack forward into the future.

If Echo is going to be an XML format, then it is crucial that it not try to violate either the letter or the spirit of well-formedness. Embedding malformed markup inside CDATA sections is a horrendous idea that makes RSS much harder to handle than it should be. Escaping HTML is a constant source of trouble for anyone attempting to process RSS with XML parsers, XSLT processors, schema validators, and other standard tools. Equally importantly, it's completely unnecessary. There is no good reason not to simply embed well-formed HTML in the appropriate elements such as description or content. Continuing to support escaped, malformed HTML in Echo will continue to produce fragile systems that don't interoperate. Directly including well-formed HTML will create a much more robust Web that's easier to process and makes life nicer for authors, readers, aggregators, syndicators, and all other producers and consumers of Echo content.

I'm not the only one who thinks so. Tim Bray has also weighed in on this topic. Discussion is taking place on a Wiki and on a blog.

From reading those discussions, I think most people recognize that escaped HTML is a really horrible kludge. It's ugly, and it causes problems. Nonetheless, most people seem willing to compromise on this. This is wrong. There must be no compromise here. There is no excuse for using escaped markup in XML documents. It must not be allowed. The only legitimate reason to escape a character like < or &, whether with entity references, CDATA sections, or numeric character references is because the author really does want the actual < or & character to be used as text, not markup. Syndicators who cannot produce well-formed markup should be taught how to do so, not encouraged to continue in their bad habits; and until they learn, their feeds should be dropped from the community. like any other malformed data. Anything less is not XML. The advantages of using real, well-formed XML for this are too obvious and too proven to be ignored. the problems with using pseudo-XML have been made equally obvious by RSS. The time has long since come to accept XML fully. Anything less is unacceptable.

Sunday, July 13, 2003

Bill Venners has posted Lessons Learned from JDOM, the fourth part of an interview he conducted with me at Software Development 2003 West a few months ago.

Saturday, July 12, 2003

The XML Apache Project has released Batik 1.5, an open source Scalable Vector Graphics (SVG) renderer written in Java. This release fixes many bugs and supports most of SVG 1.0.


Russell Thackston has released jSimpleX 2.1, an open source XSLT GUI written in Java. It also supports a command-line mode, as well as, an Ant task for automated builds. Java 1.4 or later is required. jSimpleX is published under the GPL.

Friday, July 11, 2003

Michael Kay has released Saxon 7.6.5, an experimental open source implementation of large parts of XSLT 2.0 and XPath 2.0 in Java. According to Kay, "It's mainly concerned with fixing teething troubles in the XQuery implementation, but there are some changes on the XSLT side too, in particular a rewrite of xsl:for-each-group that takes it up to the latest spec level and incorporates some performance improvements." Saxon is published under the Mozilla Public License 1.0.

Thursday, July 10, 2003

Morphon is now making the Morphon XML Editor 3.0 free beer. This XML editor provides WYSIWYG, source, and tree views, live spell checking, printing, print preview, and source code editing. It was formerly payware. Registration is optional. Java 1.3 or later is required.

Wednesday, July 9, 2003

BEA has posted the GA release (whatever that means, General Availability? Golden Alpha? Godzilla Attacks?) of XMLBeans, XML-Java binding tool based on the W3C XML Schema Language that also provides access to the full underlying XML Infoset through an XML Cursor API. It's not open source yet, and registration is required to download it, but BEA is in the process of moving this under the Apache umbrella.

Tuesday, July 8, 2003

Geoff Oakham is working on doc2xml, an open source Python tool that converts Word 97, 2000, and 2002 documents into XML. The current 0.0.1 release is pretty rough. doc2xml is published under the GPL.


Daniel Veillard's released version 2.5.8 of libxml2, the XML C library for Gnome and version 1.0.31 of libxslt, the GNOME XSLT library for C and C++. These releases fix assorted bugs and make a few portability improvements.


Monday, July 7, 2003

I'm pleased to announce that XOM 1.0d18 is now available. 1.0d18 adds and one major new feature, and this one's way cool: It is now possible to subclass NodeFactory in order to filter and/or stream your processing. XOM can now handle documents of effectively arbitrary size while using only slightly more memory than the underlying SAX parser! You'll find complete details on the main XOM page and in the JavaDoc for NodeFactory. I've also written lots of new sample programs that you'll find in the nu.xom.samples package. Many of them are streaming versions of earlier, less memory efficient samples.

This developed from an idea proposed by John Cowan, based on Simon St. Laurent's work with MOE. There have been things like this before, (DOMBuilderFilter in DOM3, MOE, ElementScanner in JDOM, and of course SAX filters) but I don't think any API has done quite as neat a job as XOM now does. This is really powerful stuff. Not only does it make programs faster and much, much smaller. It makes them much easier to write. For instance, you can easily throw away all white space only nodes on build so you're left with only the real content of the document. No more white space nodes getting in the way of your navigation. I urge you to check this out. It will radically change how you think about processing XML.

This release is API compatible with 1.0d17. All programs that compiled in 1.0d17 should still compile in 1.0d18 without any edits. XOM is published under the LGPL.


Toni Uusitalo's posted Parsifal 0.7, a minimal, non-validating XML parser written in ANSI C. The API is based on SAX2. This release adds support for internal and external general entities, and fixes assorted bugs. Parsifal is in the public domain.

Sunday, July 6, 2003

Remember when the browser was supposed to become the desktop? Well, it's a few years late and still a few pixels short, but it may finally be arriving. Randall Knutson has released the first prototype of Robin, a remote desktop built from XUL that runs in Mozilla based browsers.

Saturday, July 5, 2003

The W3C Technical Architecture Group (TAG) has posted a new working draft of Architecture of the World Wide Web. According to the abstract,

The World Wide Web is a networked information system. Web Architecture consists of the requirements, constraints, principles, and choices that influence the design of the system and the behavior of agents within the system. When Web Architecture is followed, the large-scale effect is that of an efficient, scalable, shared information space. The organization of this document reflects the three divisions of Web architecture: identification, representation, and interaction. This document also addresses some non-technical (social) issues that play a role in building the shared information space.

This document strives to establish a reference set of requirements, constraints, principles, and design choices for Web architecture.

Friday, July 4, 2003

The W3C XQuery Working Group has published a revised working draft of XML Query (XQuery) Requirements. "This document specifies goals, requirements, and usage scenarios for the W3C XML Query (XQuery) data model, algebra, and query language. It also includes, for each requirement, a corresponding status, indicating the current situation of the requirement in the XML Query family of specifications." This is a bug fix release.


The W3C CSS working group has posted the last call working draft of CSS3 module: Basic User Interface. According to the abstract, this working draft contains:

  • Pseudo-classes and pseudo-elements to style user interface states and element fragments respectively.
  • Additions to the user interface features in CSS2.
  • The ability to style the appearance of various standard form elements in HTML4 and properties to augment or replace some remaining stylistic attributes in HTML4.
  • Directional focus navigation properties.
  • A mechanism to allow the styling of elements as icons for accessibility.

Jochen Wiedmann's released JaxMe 1.63, "yet another open source Java/XML binding tool in the style of Castor or Zeus" that sits on top of SAX2. JaxMe provides code generators that read a W3C XML schema and generate code for parsing conformant XML documents into corresponding Java objects, saving those objects into a database or reading those Java objects from a database and converting them into XML. JaxMe supports SQL databases and Tamino. It includes an integrated application framework and a generator for EJB entity beans with bean managed persistence (BMP). It's based on a reduced subset of the W3C XML schema language that does not support choices, references, or recursion. This release fixes bugs.

Thursday, July 3, 2003

Alexandre Brillant has released JXP 1.1, a €50 shareware XPath 1.0 API that can be customized to fit different object models. This release adds support for namespaces and variables and fixes assorted bugs.


Morphon has released the Morphon XML Editor 3.1, an XML editor that provides WYSIWYG, source, and tree views. Version 3.1 is now free-beer though registration is required.


Stefan Champailler's posted DTDDoc 0.0.7, a JavaDoc like tool for creating HTML documentation of document type definitions from embedded DTD comments. This release can display the parent of an element or attribute parent and can configure the title that is displayed on top of the index. DTDDoc is published under the GPL.

Wednesday, July 2, 2003

Mozilla 1.4 has been posted for the usual batch of platforms (Windows 95+, Mac OS X, Linux, OpenVMS, Solaris, AIX, and HP/UX). Mozilla supports XML, XHTML, HTML, CSS, XSLT, MathML (with extra fonts), and more. New features since 1.3 include:

  • NTLM authentication on Windows so Mozilla can talk to Microsoft web and proxy servers that use "windows integrated security".
  • Bookmarks now include a root level folder, the ability to have two differently named bookmarks pointing at the same location, site icons in the Bookmark Manager and Bookmarks Sidebar, and labeled separators
  • Click and drag dynamic image and table resizing in Composer
  • Many usability improvements for spam control and pop-up blocking
  • Users can now specify "blank page," "home page," or "Last page visited" for each of first window, new window and new tab
  • Users can now specify default font, size and color for HTML mail compose
  • Image blocking/disabling is now more flexible and users can "view image" to see blocked or not loaded images.
  • Proxy auto-config (PAC) failover
  • Internationalized domain names

For the first time, the commerical Netscape browser has caught up with the open source Mozilla. Netscape 7.1 has also been released, and is based on Mozilla 1.4.

Friday, June 27, 2003

Frank McIngvale has released the Gnosis Utils 1.0.7, a public domain collection of Python modules for processing XML:

  • xml.pickle serializes objects to and from XML using an API compatible with the standard pickle module
  • xml.objectify turns arbitrary XML documents into Python objects
  • xml.validity checks validity against DTDs or schemas
  • xml.indexer provides full text indexing and searching of XML documents

This release makes several minor improvements.


Thursday, June 26, 2003

Andy Clark has posted a new release of his CyberNeko Tools for the Xerces Native Interface (NekoXNI). This release fixes assorted bugs.


Jerome Alet has released Jaxml 3.0.1, "a Python module designed to ease the creation of human readable XML documents." JAXML is published under the GPL. This is a bug fix release.

Wednesday, June 25, 2003

The World Wide Web Consortium (W3C) has released the final recommendation version of SOAP 1.2. This is provided as four specs:


The third release candidate of Mozilla 1.4 has been posted for the usual batch of platforms (Windows 95+, Mac OS X, Linux, OpenVMS, Solaris, AIX, and HP/UX) This release fixes assorted bugs, but adds no new features.


The W3C Web Content Accessibility Guidelines Working Group has posted the third public working draft of Web Content Accessibility Guidelines 2.0. Quoting from the introduction:

This document outlines design principles for creating accessible Web content. When these principles are ignored, individuals with disabilities may not be able to access the content at all, or they may be able to do so only with great difficulty. When these principles are employed, they also make Web content accessible to a variety of Web-enabled devices, such as phones, handheld devices, kiosks, network appliances, etc. By making content accessible to a variety of devices, that content will also be accessible to people in a variety of situations.

The design principles in this document represent broad concepts that apply to all Web-based content. They are not specific to HTML, XML, or any other technology. This approach was taken so that the design principles could be applied to a variety of situations and technologies, including those that do not yet exist.

Tuesday, June 24, 2003

Michael Kay has released Saxon 7.6, an experimental open source implementation of large parts of XSLT 2.0 and XPath 2.0 in Java. This release adds support for XQuery 1.0 as well.


YesLogic has released Prince 2.1, a $295 payware batch formatter for Linux and Windows that produces PDF and PostScript from XML documents with CSS stylesheets. New features in Prince 2.1 are fairly minor and include default style sheets, and Red Hat 7.3 compatibility.


OpenOffice.org 1.0.3 for Mac OS X has been released. This open source office suite saves all its files in a native XML format. It can also open and save Microsoft Office documents. It includes a word processor, drawing prgram, spreadsheet, and presentation program. The GUI is based on X Windows rather than Aqua though, so it's still not really a Mac application. In essenece, this is dancing bear software. It's impressive that they got OpenOffice to run at all on the Mac, but it runs so poorly. (Full-screen mode doesn't work in the presentation app. Menus aren't where you expect them; etc., etc. etc. ) I can't imagine any Mac user putting up with this for more than about 5 minutes before switching back Microsoft Office or KeyNote or some other true Mac-savvy application. OpenOffice.org is dual licensed under the GPL and the Sun Industry Standard License.


Apple has released Safari 1.0, a web browser for Mac OS X based on the KHTML rendering engine. Safari supports direct display of XML documents with CSS stylesheets but does not support XSLT.

Monday, June 23, 2003

XML Benchmark 1.2 is a C/C++/Java toolset for benchmarking XML parsers including libxml2, Xerces, Oracle XDK, Expat, RXP, QT, and Crimson. Benchmarks include parsing (native, SAX, DOM), DOM manipulation, schema validation, XSL transformation, and XML signature and encryption. Version 1.2 adds support for XmlSec1, Apache XML Security for C, J2SE + Java WebServices Developer Pack 1.2, IBM XML4C 5.2, and Xalan 1.5. It also fixes a few bugs.

I've learned to treat benchmarks with a 20-pound bag of rock salt until proven otherwise. However, this product gets at least one thing right. It lets you plug in "Any valid XML file" so you can test parsers on the kind of documents you're interested in rather than on whatever the benchmark vendor has. Most parsers exhibit wildly varying performance characteristics depending on the type of XML document (large or small, record-like or narrative, many attributes or few attributes, etc.). It's not clear whether or not this parser can test well-formed but invalid documents.

Sunday, June 22, 2003

The W3C DOM Working Group has updated Document Object Model (DOM) Level 3 Load and Save last call working draft. Changes in the this draft include:

  • The DOMBuilder interface has been renamed DOMParser. DOMBuilderFilter interface has been renamed DOMParserFilter.
  • The DOMInputSource interface has been renamed DOMInput.
  • DOMInput now has getCertified and setCertified methods to indicate whether the text is known to be Unicode normalized.
  • The DOMWriter interface has been renamed DOMSerializer. DOMWriterFilter interface has been renamed DOMSerializerFilter.
  • The DOMSerializer interface has a new writeURI method that write the output to a particular URI:

    public boolean writeURI(Node node, String URI)

  • In DOMIMplementationLS the createDOMParser method has been changed to createDOMBuilder and createDOMInputSource has been changed to createDOMInput
  • There's a new DOMOutput interface that enables "an application to encapsulate information about an output destination in a single object, which may include a URI, a byte stream (possibly with a specified encoding), a base URI, and/or a character stream."
    package org.w3c.dom.ls;
    
    import org.w3c.dom.DOMWriter; 
    import org.w3c.dom.DOMOutputStream;
    
    public interface DOMOutput { 
    
    public DOMWriter getCharacterStream(); 
    public void setCharacterStream(DOMWriter characterStream);
    
    public DOMOutputStream getByteStream(); 
    public void setByteStream(DOMOutputStream byteStream);
    
    public String getSystemId(); 
    public void setSystemId(String systemId);
    
    public String getEncoding(); 
    public void setEncoding(String encoding);
    
    }

RenderX has released version 3.5 of XEP, its payware XSL Formatting Objects to PDF and PostScript converter. Version 3.5 adds SVG support as fixing various bugs and making assorted performance improvements. The basic client is $299.95. The developer edition with an API is $999.95. The server version is $4999.95. Updates from 3.0 are free.

Saturday, June 21, 2003

Peter J. Jones has posted xmlwrapp 0.4.2, a C++ library for working with XML built on top of Daniel Veillard's libxml2. This release fixes bugs. xmlwrapp is published under a BSD license.

Friday, June 20, 2003

James Clark has released his latest XML invention, the Namespace Routing Language (NRL). According to Clark, "The XML Namespaces Recommendation allows an XML document to be composed of elements and attributes from multiple independent namespaces. Each of these namespaces may have its own schema; the schemas for different namespaces may be in different schema languages. The problem then arises of how the schemas can be composed in order to allow validation of the complete document. NRL attempts to solve this problem." NRL can combine schema in arbitrary schema languages. The sample implementation, Jing, RELAX NG , W3C XML Schemas, and Schematron; and uses a SAX-based plug-in architecture that can dynamically add new schema languages. Features include:

  • "Extension of schemas not designed to be extended. For example, suppose you have an schema for XHTML which does not allow extension, but you want to embed SVG in XHTML. NRL allows you to do so without having to add wildcards to your XHTML schema."
  • Easier authoring of extensible schemas. "Instead of having to clutter up a schema with wildcards, you can write a simple schema without wildcards and then use NRL to specify what kind of extension is allowed."
  • Transparent namespaces so <x><t:y><z/></t:y></x> can be validated like <x><z></x>. I can see this would be useful for XSLT.
  • Contextual control of extension. For example, if a W3C XML Schema uses wildcards in different contexts, NRL can control which namespaces are allowed in each context.
  • Concurrent validation of particular namespaces or the whole document against multiple schemas in different schema languages
  • Streaming vaidation
Thursday, June 19, 2003

I've posted version 1.0d17 of XOM, my open source tree-based library for processing XML with Java that strives for maximum simplicity and correctness. The primary focus of this release is enhanced compatibility with Crimson (the default XML parser in Java 1.4) by working around various bugs. I've also fixed numerous bugs in the XInclude implementation. XOM may well be the most correct implementation of XInclude currently available. It is definitely better than my previous efforts for SAX, DOM, and JDOM. In addition, this release features about a dozen other random bug fixes and improvements. There are only minor API level incompatibilities in this release. (XSLTransform is now final.) Almost all code that ran with 1.0d16 should run unchanged with 1.0d17. XOM is published under the LGPL.


Mikhail Grushinskiy's XMLStarlet is a collection of command line programs written in C and based on libxml and libxslt that "can be used to transform, query, validate, and edit XML documents and files using simple set of shell commands in similar way it is done for plain text files using UNIX grep, sed, awk, diff, patch, join, etc commands." Features include

  • Check or validate XML files (simple well-formedness check, DTD, XSD, Relax NG)
  • Calculate values of XPath expressions on XML files
  • Search XML files for matches to given XPath expressions
  • Apply XSLT stylesheets to XML documents
  • Modify or edit XML documents
  • Format or pretty print XML documents
  • Browse tree structure of XML documents (in similar way to 'ls' command for directories)
  • Resolve XIncludes
  • Canonicalize documents
  • Escape/unescape special XML characters in input text
  • Print directory as XML document
  • Convert XML into PYX
Wednesday, June 18, 2003

Nokia has submitted JSR-226, Scalable 2D Vector Graphics API for J2ME to the Java Community Process (JCP). Quoting from the JSR,

This specification will define an optional package API for rendering scalable 2D vector graphics, including image files in W3C Scalable Vector Graphics (SVG) format. The API is targeted for J2ME platform, with primary emphasis on MIDP. The main use cases for this API are map visualization, scalable icons, and other advanced graphics applications.

The main target platform of this API is J2ME/CLDC/MIDP. The API is targeted at CLDC class devices that typically have very little processing power and memory, and no hardware support for 2D graphics or floating point arithmetic. The API shall allow utilization of native 2D graphics features of the device when applicable.

The API should include:

  • Ability to load and render external 2D vector images, stored in the W3C SVG Tiny format.
  • Rendering of 2D images that are scalable to different display resolutions and aspect ratios.

Last night at the monthly meeting of the New York XML SIG, David Megginson pointed out that the W3C is schizophrenic. It is both a research and development organization and a standards organization; and it often uses its standards wing to attempt to foreclose competition with its R&D. Think of schemas.

This morning I'm realizing the JCP is like that too. I've never even seen an SVG API, and hardly anyone is using one, yet here's Nokia proposing to standardize one (and charge $50,000 + $20,000 per year for the technology compatibility kit). The recent JSR 225 XQuery API for Java (XQJ) is similar. Both of these are very good subjects for research. Neither is yet appropriate for standardization, nor should we assume that whatever products comes out of these groups' research will automatically be standardized.

Perhaps what's needed is a more formal R&D organization for Java where different interested groups like Sun, Apache, IBM, and Nokia can come together to work on topics of mutual interest that is not a standards body. Perhaps in the spirit of scientific research it could be a genuinely open process that doesn't require NDAs and big licensing fees. Such an organization should explicitly allow forking and incompatible implementations as part of the experimental process. If we don't try different approaches, we won't know which ones work best.

The EG shall consider possibilities for subsetting from Java 2D API / JSR-209. Where subsetting is not possible, the API should be efficiently implementable on top of the Java 2D API / JSR-209. The API should be rich enough to support an SVG Tiny implementation.

Tuesday, June 17, 2003

Opera Software ASA has released version 7.11 of their namesake web browser for Windows that supports direct display of XML with attached CSS style sheets. Version 7.11 "includes miscellaneous fixes to usability, accessibility, privacy, security, stability, plug-ins support, as well as various improvements to the M2 e-mail client." This release is available in Brazilian, Chinese (simplified), Chinese (traditional), Danish, Dutch, English, Finnish, French, German, Japanese, Norwegian (bokml and nynorsk), Polish, Portuguese, Russian, Spanish, and Swedish.


Microsoft has released an XSLT stylesheet that can convert Microsoft Office 2003 WordML files in the Beta2 namespace http://schemas.microsoft.com/office/word/2003/2/wordml to HTML. "This version of the XSL Transformation will not render images and objects properly, but that will be fixed for the final version. This is just a beta version of the transformation, and is not supported." Windows only, natch.

Monday, June 16, 2003

The W3C and the Unicode Consortium have updated Unicode Technical Report #20, Unicode in XML and other Markup Languages. This document covers many less familiar aspects of Unicode as they affect XML. Topics include bidirectional controls, line breaks, compatibility characters, and so forth. IT explains which characters should and should not be used in XML documents. (Some of the Unicode controls are more appropriately handled by markup in an XML environment.) This release updates the document to cover Unicode 4.0.


Version 1.4.2 of Jaxe, an open source (GPL) XML GUI editor written in Java 1.3, has been released. It is configurable with an XML schema and a configuration file, supports validation at element insertion, is customisable via Java modules, and can use XSLT to display documents as XHTML. A little unusually, the configuration files are written in French--ENCODAGE instead of ENCODING, BALISE instead of TAG, etc. The user interface has been localized into French, English, and German. Version 1.4.2 can preserve white space and fixes a few bugs.

Sunday, June 15, 2003

Alexandre Brillant has released JXPath 1.0, a €50 shareware XPath 1.0 API that can be customized to fit different object models. It's not immediately clear how conformant it is to the XPath spec. The advertised test cases don't come close to covering the space of possible XPath expressions. Update: Brilliant renamed the product JXP to clear up a confusion with the Apache Project's own JXPath.


The W3C Voice Browser Working Group has published the third working draft of Voice Browser Call Control: CCXML Version 1.0. According to the spec abstract, "CCXML is designed to provide telephony call control support for VoiceXML or other dialog systems. CCXML has been designed to complement and integrate with a VoiceXML system. Because of this you will find many references to VoiceXML's capabilities and limitations. You will also find details on how VoiceXML and CCXML can be integrated. However it should be noted that the two languages are separate and are not required in an implementation of either language. For example CCXML could be integrated with a more traditional IVR system and VoiceXML or other dialog systems could be integrated with some other call control systems."

Saturday, June 14, 2003

W. Eliot Kimber has submitted a note to the W3C on XIndirect, "a simple mechanism for using XML to represent indirect addresses in order to augment the core functionality of XLink and XPointer without requiring either of those specifications to themselves require support for indirect addresses. The facilities defined are specifically designed to meet the requirements for systems that support the authoring and management of complex systems of documents."

I haven't quite grokked the syntax yet, but it looks interesting. I think the basic idea is simply that instead of pointing directly to the document you want to link to, you point to a holder for the address of that document. That is, this is the difference between a pointer and a handle (pointer to pointer). This allows the linked-to resources to change addresses without updating the linking resources, as long as you update the intermediate indirect.

Friday, June 13, 2003

Rob Rohan has posted Treebeard 0.8, an open source, XSLT IDE written in Java. Treebeard has a pluggable XML and XSLT parser architecture, and comes bundled with Xalan 2.5 and Saxon 7.5.

Thursday, June 12, 2003

The W3C DOM Working Group has posted the last call working draft of Document Object Model (DOM) Level 3 Core Specification. This draft has a better description of DOM features, and various other editorial improvements. API level changes since the last draft include:

  • DOMException now has a an additional TYPE_MISMATCH_ERR code to be used when "the type of an object is incompatible with the expected type of the parameter associated to the object."
  • The version, standalone, and encoding attributes of Document and Entity have been renamed xmlVersion, xmlStandalone, and xmlEncoding. It's not clear why. What else would they be? This seems redundant. Perhaps it's intended to avoid confusion with HTML DOMs? If so, should these fields really be here at all?

The W3C Web Services Description Working Group has posted three new working drafts of WSDL 1.2:

Web Services Description Language (WSDL) Version 1.2: Core Language
"Web Services Description Language (WSDL) provides a model and an XML format for describing Web services. WSDL enables one to separate the description of the abstract functionality offered by a service from concrete details of a service description such as "how" and "where" that functionality is offered."
Web Services Description Language (WSDL) Version 1.2: Message Patterns
"Web Services Description Language (WSDL) message patterns define the sequence, direction, and cardinality of abstract messages sent or received by an operation. By design, WSDL message patterns abstract out specific message types; placeholders for messages identified by the pattern are associated with specific message types by the operation using the pattern. Unless explicitly stated otherwise, WSDL message patterns also abstract out binding-specific information like timing between messages, whether the pattern is synchronous or asynchronous, and whether the message are sent over a single or multiple channels."
Web Services Description Language (WSDL) Version 1.2: Bindings
"WSDL is an XML format for describing network services as a set of endpoints operating on messages containing either document-oriented or procedure-oriented information. WSDL Version 1.2 Bindings describes how to use WSDL in conjunction with SOAP 1.2 [SOAP 1.2 Part 1: Messaging Framework], HTTP/1.1 GET/POST [IETF RFC 2616], and MIME [IETF RFC 2045]."

Malcolm Wallace and Colin Runciman have released version 1.09 of HaXml, a bug fix release of the XML processing library for the Haskell language. According to the web page,

HaXml is a collection of utilities for using Haskell and XML together. Its basic facilities include:

  • a parser for XML,
  • a separate error-correcting parser for HTML,
  • an XML validator,
  • pretty-printers for XML and HTML.

For processing XML documents, the following components are provided:

  • Combinators is a combinator library for generic XML document processing, including transformation, editing, and generation.
  • Haskell2Xml is a replacement class for Haskell's Show/Read classes: it allows you to read and write ordinary Haskell data as XML documents. The DrIFT tool (available from http://repetae.net/~john/computer/haskell/DrIFT/) can automatically derive this class for you.
  • DtdToHaskell is a tool for translating any valid XML DTD into equivalent Haskell types.
  • In conjunction with the Xml2Haskell class framework, this allows you to generate, edit, and transform documents as normal typed values in programs, and to read and write them as human-readable XML documents.
  • Finally, Xtract is a grep-like tool for XML documents, loosely based on the XPath and XQL query languages. It can be used either from the command-line, or within your own code as part of the library.

HaXml is distributed under the Artistic License.

Wednesday, June 11, 2003

IBM and Oracle have submitted Java Specification Request (JSR) 225, XQuery API for Java (XQJ), to the Java Community Process. "This specification will define a set of interfaces and classes that enable an application to submit XQuery queries to an XML data source and process the results of these queries. The design of the API will also take into account precedents established by other JSRs, notably JDBC and JAXP."

I have to say I think this is way too soon. XQuery is not finished yet, and there's virtually no experience with XQuery APIs in the community. Standardization should wait until there have been a number of different APIs, and we can begin to see what works and what doesn't. Otherwise there's a strong risk of repeating the disastrous experience of JAXP which is causing problems to this day (and JAXP was standardized much later in the life of XML than it is now with respect to XQuery). Design by standard is simply not a good idea. It locks in mistakes and locks out too many good ideas. IBM and Oracle and whoever else is interested should design their own APIs for XQuery, outside the JCP, and only later bring these together in a standards effort if a common API seems useful.

Tuesday, June 10, 2003

Propylon has released PropelX 2.0.2, a J2EE implementation of the XPipe approach to XML transformation. Registration is required for download.

Monday, June 9, 2003

Version 2.0.2 of the payware <Oxygen/> XML editor has been released. Oxygen supports XML, XSL, DTDs, and the W3C XML Schema Language. New features in version 2.0.2 include:

  • Relax NG support
  • Bundled TEI DTDs, stylesheets and templates
  • Import HTML documents as XHTML
  • Templates for XHTML documents
  • Code-Insight for entities
  • Attribute values specified as fixed in the DTD are now presented in the Code-Insight.
  • Drag & Drop in Tree View
  • Source/Tree View Synchronization
  • Indent selection
  • Menu options enabled or disabled based on the editing context
  • Insert file at cursor
  • Configurable mapping between a file type and a syntax highlight scheme.
  • Backup copy on save

Oxygen requires Java 1.3 or later. It costs $74.

Sunday, June 8, 2003

Dimitre Novatchev has EXSLT for MSXML4 1.0, an open source implementation of parts of the standard XSLT extension library. Extension functions it supports include:

  • intersection()
  • difference()
  • has-same-node()
  • distinct()
  • leading()
  • trailing()
Saturday, June 7, 2003

Sun's released version 1.2 of the Java Web Services Developer Pack (JWSDP). This bundles togther a number of XML-related technologies including:

  • JavaServer Faces (JSF) v1.0 EA4
  • XML and Web Services Security v1.0 EA
  • Java Architecture for XML Binding (JAXB) v1.0.1
  • Java API for XML Processing (JAXP) v1.2.3
  • Java API for XML Registries (JAXR) v1.0.4
  • Java API for XML-based RPC (JAX-RPC) v1.1 EA
  • SOAP with Attachments API for Java (SAAJ) v1.2 EA
  • JavaServer Pages Standard Tag Library (JSTL) v1.1 EA
  • Java WSDP Registry Server v1.0_05
  • Ant Build Tool 1.5.2
  • Apache Tomcat 5 servlet container
  • Ws-I Supply Chain Management Sample Application 1.0 EA

They're a lot of bug fixes throughout this release. In addition, "JAX-RPC and SAAJ have been extended to add support for the WS-I basic profile 1.0 draft"; and there are some new sample applications. Xerces 2.3 is now the bundled parser. JAXB 1.0.1 has experimental support for RELAX NG schemas. The download is over 40 MB. Java 1.3.1 or later is required for at least some of the bundled technologies. You'll need to use the endorsed standards override mechanism (i.e. jre/lib/endorsed) to install some of this in Java 1.4.x. Otherwise, the older versions bundled with the JDK will take precedence.

Friday, June 6, 2003

Pekka Enberg's posted version 0.2.5 of XML Indent, an open source (GPL) "XML stream reformatter written in ANSI C" that "is analogous to GNU indent." This is a bug fix release.

Thursday, June 5, 2003

BEA Systems has posted the first public review draft of Java Specification Request (JSR) 173, Streaming API for XML (StAX), to the Java Community Process. This JSR proposes a Java-based, pull-parsing API for XML. StAX offers two approaches. XMLStreamReader and XMLStreamWriter are a cursor API designed to read and write XML as efficiently as possible. XMLEventReader and XMLEventWriter are an iterator API designed to be easy to use, event based, easy to extend, and allow easy pipelining. The iterator API sits on top of the cursor API. There is a reference implementation bundled with the spec andf JavaDoc. I haven't had time to write code with it yet, or to test the performance; but overall from the spec and JavaDoc I'd say this is the cleanest, most XML conformant pull parser I've seen to date. It's definitely a substantial improvement on XMLPULL. Comments are due by July 4.


Stéphane Conversy and Jean-Daniel Fekete has posted an alpha of svgl, a library that displays SVG pictures using OpenGL, taking advantage of the GPU. It compiles and run on Linux, Mac OS X, and Windows using cygwin. svgl is published under the LGPL.


Oleg Tkachenko has released nxslt 1.2, a Windows command line utility for accessing the .Net XSLT engine. New features in this release include support for 60 EXSLT extension functions as well as custom extension functions. nxslt is written in C# and requires the .NET Framework version 1.0 to be installed.

Wednesday, June 4, 2003

The XML Apache Project has released Xalan-Java 2.5.1, an open source XSLT processor. Most of the changes in this release are bug fixes or small performance improvements. There's also an alternate binary distribution that puts XSLTC and Xalan-Interpretive in separate jar files so they can be distributed or bundled independently of each other.

Tuesday, June 3, 2003

SKYRiX AG has released the SKYRiX Libraries for XML processing 4.2-3, an open source Objective C class library for processing XML. This contains:

  • An Objective C port of SAX2 for Objective-C
  • An Objective C wrapper for CoreFoundation XML
  • An Objective C wrapper for libxml2
  • An Objective C wrapper for libical
  • An Objective C wrapper for expat
  • An Objective C wrapper for plists
  • An Objective C wrapper for pyx
  • A DOM implementation for Objective C
  • An XML-RPC implementation

The libraries are published under the Lesser General Public License (LGPL)

Monday, June 2, 2003

I've posted version 1.0d16 of XOM, my open source tree-based library for processing XML with Java that strives for maximum simplicity and correctness. The primary focus of this release is unit tests and bug fixes for XSLT. In addition, they're about a dozen other random bug fixes and improvements. There are only minor API level incompatibilities in this release. (SAXConverter and DOMConverter have been moved from the core into a new converters package, but are otherwise the same.) Almost all code that ran with 1.0d15 should run unchanged with 1.0d16. XOM is published under the LGPL.


The W3C Web Ontology Working Group has a new last call working draft of OWL Web Ontology Language Test Cases:

As part of the definition of the Web Ontology Language (OWL) the Web Ontology Working Group provides a set of test cases. This document presents those test cases. They are intended to provide examples for, and clarification of, the normative definition of OWL found in [OWL Semantics and Abstract Syntax] to which this document is subsidiary.

This document describes the various types of test used and the format in which the tests are presented. Alternative formats of the test collection are provided. These are intended to be suitable for use by OWL developers in test harnesses, possibly as part of a test driven development process, such as Extreme Programming [XP]. The format of the Manifest files used as part of these alternative formats is described.


The ISO/IEC JTC1 SC24 and the W3C PNG Group have posted the proposed recommendation of the second edition of the Portable Network Graphics specification. PNG is a binary format deisgned to replace the patent encumbered GIF. It is not based on XML. "This International Standard is strongly based on the W3C Recommendation 'PNG Specification Version 1.0' which was reviewed by W3C members, approved as a W3C Recommendation and published in October 1996. This second edition incorporates all known errata and clarifications."

Sunday, June 1, 2003

Sun and IBM have posted the public review draft specification for Java Specification Request (JSR) 105, XML Digital Signature APIs. At first glance, this appears to assume you're working with DOM trees exclusively, not SAX or JDOM or anything else. Comments are due by June 29.

Saturday, May 31, 2003

The first release candidate of Mozilla 1.4 is out. for the usual batch of platforms: Windows, Linux, Open VMS, Solaris, AIX, HP/UX, and Mac OS X. (In fact, pretty much everything except Mac OS 9, which the Mozilla Project has effectively abandoned, much to my annoyance, especially since their last 1.2.1 release for Mac OS 9 has lots of nasty bugs). The popular open source web browser supports XML, CSS, XSLT, XHTML, HTML, DOM Level 1 and Level 2, and JavaScript. Java and Flash support are available through plug-ins. New features in 1.4 include dynamic image and table resizing in Composer, smooth scrolling (disabled by default) and improvements to spam filtering as well as bug fixes addressing speed, stability, standards support and website compatibility. Bookmarks now include a root level folder, the ability to have two differently named bookmarks pointing at the same location, site icons in the Bookmark Manager and Bookmarks Sidebar, and labelled separators. Changes since the beta release are mostly bug fixes.

Friday, May 30, 2003

Slava Pestov has uploaded the second pre-release of jEdit 4.2, an open source programmer's editor written in Java with extensive plug-in support and my preferred text editor on Windows and Unix. New features in this release include a VIM/Emacs-style "kill ring", quick copy between text areas, files in the favorites list, and some new syntax highlighting modes were added.


Tony Graham has posted xslide 0.2.1, an emacs major mode for XML.

Thursday, May 29, 2003

I've posted version 1.0d15 of XOM, my tree-based library for processing XML with Java that strives for maximum simplicity and correctness. The primary focus of this release is XInclude. I believe that, modulo undiscovered bugs, this is now a fully conformant implementation of the XInclude Candidate Recommendation including support for fallbacks, the XPointer element() scheme, and preservation of base URI information using xml:base attributes. There are 24 unit tests for XInclude added in this release as well to try to keep the undiscovered bugs to a minimum.

There were a couple of other minor improvements in this release as well. The Element.getChildElements(String name, String namespaceURI) now allows a null or empty string local name to stand for any local name, so you can use this method to get all elements in a certain namespace. Serializer no longer wraps and indents text when xml:space="preserve", regardless of the setting of indents and maxlength. This release should be completely compatible with code written against 1.0d14. You should not even need to recompile existing programs.

In related news, Bill Venners has posted part I of his interview with me at Software Development West back in March. The interview focuses on XOM and the principles that influenced its design. Finally, Linux Magazine has posted, Java XOM: XML Made Simpler, an introductory article about XOM by Rogers Cadenhead. He wrote with XOM 1.0d8, but looking at the code I think it still all applies to 1.0d15.

Wednesday, May 28, 2003

The XML Apache Project has released Xerces C++ 2.3.0, a schema-validating XML parser written in fairly portable C++. New features in this release include pluggable memory management, a pluggable panic handler, defense against the billion laughs attack, and partial implementation of DOM level 3 normalization.

Tuesday, May 27, 2003

Fraunhofer IPSI has released IPSI-XQ 1.3.0, an XQuery implementation that supports the latest May 2 working drafts. There's a GUI and a Java API.


YesLogic has released Prince 2.0, a $295 payware batch formatter for Linux and Windows that produces PDF and PostScript from XML documents with CSS stylesheets. New features in Prince 2.0 include

  • Multi-Column Layout and Floating Blocks
  • Lists, Counters and Generated Content
  • Page Headers / Footers and Duplex Printing
  • Many additional CSS properties and selectors
  • Graphical User Interface for Microsoft Windows
Saturday, May 24, 2003

Norm Walsh has released version 1.61.2 of his XSLT stylesheets for DocBook. This release adds an Arabic locale and fixes one nasty bug.

Friday, May 23, 2003

Michael Kay has released version 7.5.1 of Saxon, an experimental open source XSLT processor written in Java that supports large parts of the latest drafts of XSLT 2 and XPath 2.0. This release just fixes a tail recursion bug introduced in 7.5.0. Most normal users should stick with Saxon 6.5.2 and XSLT 1.0 for the time being.


The OpenOffice Project has posted the second beta of OpenOffice 1.1, an open source office suite for Linux and Windows that saves all its files as zipped XML. I used the previous 1.0 version to write Effective XML. New features since the first beta include printer independent layout, GUI support for XSLT based filters, and enhanced OLE editing on Windows. Of course many bugs have been fixed as well. New features in since 1.0 include import and export of PDF, Macromedia Flash, DocBook, several PDA Office file formats, flat XML and XHTML and complex text layout for languages such as Thai, Hindi, Arabic, and Hebrew.


Thursday, May 22, 2003

The W3C has released the final version of their Patent Policy. In brief, this requires that all patent claims for W3C specs must be available on a no-cost basis to implementations of the recommendation, except under extreme circumstances. This still has some GPL compatibility issues, but it's much better than it used to be.


Opera Software ASA has released version 6.02 of their namesake web browser for Macintosh that supports direct display of XML with attached CSS style sheets. New features in this release kiosk mode. In addition, 6.0.2 has various bug fixes and speed ups. English, French, Japanese, and German versions are available now, with more languages due over the next couple of weeks. Opera is $39 payware, or free-beer adware.


Apple's released an update to beta 2 of their Safari web browser for Mac OS X. This is now beta 2 (v74) or some such. I don't know why they didn't just call it beta 3, but anyway. Apple says, "This update is recommended for all Safari users and improves how Safari validates the authenticity of websites that use SSL certificates." Safari supports direct display of XML documents with CSS stylesheets but does not support XSLT.

Wednesday, May 21, 2003

Norm Walsh has released version 1.61.1 of his XSLT stylesheets for DocBook. This release features some small enhancements and bug fixes.

Tuesday, May 20, 2003

The Mozilla Project has released version 0.6 of Firebird (nee Phoenix), a light-weight browser for Windows and Linux based on Mozilla's Gecko engine. This release also adds preliminary support for Mac OS X. Firebird supports all the yummy XML features, but doesn't include the e-mail program, news reader, or nose hair trimmer. Firebird differs from similar efforts like Galeon in that it's based on XUL and is designed for cross-platform release on Linux and Windows. New features in this release include:

  • New default theme
  • Redesigned Preferences window
  • Easy erasure of all stored private data (cookies, passwords, history, etc.)
  • Context menus for bookmarks
  • Talkback when Firebird crashes
  • Automatic Image Resizing
  • Smooth Scrolling

OOPS Consultancy has released XMLTask 1.6, an Ant task that that can modify "XML files as part of an Ant build. Unlike the standard filter task provided with Ant, it is XML-sensitive, but doesn't require you to define XSLTs."

Monday, May 19, 2003

I've posted version 1.0d14 of XOM, my tree-based API for processing XML with Java. XOM strives for maximum simplicity and absolute correctness. This release has undergone extensive profiling and optimization for both memory footprint and speed. It is much faster and the objects created are much smaller. Speed and space wise it should now be competitive with other tree-based APIs liker DOM and JDOM. API level changes since 1.0d11 (the last one I announced here) are fairly minor. They include:

  • The beginnings of a new nu.xom.benchmark package, though it's not even close to stable yet. None of the classes in here are public, but you can run the programs to get some rough timing measurements. (Timing was actually done using a profiler running sample applications, not this package. However, in the future I'd like to automate this more.)
  • Element.insertChild(String, int) now throws a NullPointerException if the first argument is null.
  • The insertBefore and insertAfter methods have been removed from ParentNode.
  • The serializer now recognizes the IBM037 encoding (a.k.a. CP037, EBCDIC-CP-US, EBCDIC-CP-CA, EBCDIC-CP-WA, EBCDIC-CP-NL, and CSIBM037).
  • If the serializer's line separator is set, then all line separators are changed to that separator on output. If the line separator is not explicitly set, then all line breaks in source text are preserved as is.
  • The arguments to insertChild (and checkInsertChild and checkRemoveChild) have been reversed. These methods are now:
    public void insertChild(Node child, int position)
    protected void checkInsertChild(Node child, int position)
    protected void checkRemoveChild(Node child, int position)
    
  • The removeChild methods now return the Node they remove:
    public Node removeChild(int position) public Node
    removeChild(Node child) 
  • The Builder method

    public Document build(String document, String baseURI)

    is now declared to throw an IOException like the other build() methods
  • The equals() and hashCode() methods were removed from the XSLTransform class.
  • Several additional methods in Element were marked final: getAttributeCount(), getNamespacePrefix(int index), removeChildren(), and getAttribute(int).

In addition, the unit tests have been expanded significantly, which resulted in the detection and elimination of numerous bugs. More details about all of this are on the XOM page.

Sunday, May 18, 2003

Oleg Tkachenko has posted the first beta (following an earlier alpha release) of an XInclude.NET library, an open source implementation of the XInclude 1.0 Candidate Recommendation written in C# for the .NET platform. The beta adds support for the XPointer framework, shorthand pointer, element(), xmlns() and xpath1() schemes. The .NET Framework 1.0 is required.

Saturday, May 17, 2003

Several people have written to me to inquire about the missing XIncluder files. IBiblio experienced some disk problems during an upgrade in the last week that took down the ftp server, among other things. It's currently being restored from backup. All files should be in place by early next week, but for such a large archive (many Linux distros, etc.) restoring from tape is a slow procedure.


CenterPoint XML 2.1.7 has been released. This is a C++ class library for reading and writing XML documents that supports SAX 2 and DOM Level 1 and Level 2. CenterPoint XML is written in Standard C++ that works on many platforms including Solaris, HP-UX, Mac OS X, Linux, Microsoft Windows NT/2000/XP, and OpenVMS. CenterPoint XML is published under a netscape-derived license.


Syntext has released Dtd2Xs 1.1, a tool for converting complex, modularized XML DTDs to W3C XML Schema Language schemas. Dtd2Xs runs on Windows and Linux. Dtd2Xs is $49 on Windows, $39 on Linux, and free for non-commercial use.

Friday, May 16, 2003

The W3C Device Independence Working group has posted the first public working draft of Core Presentation Characteristics: Requirements and Use Cases.

The intended purpose of the Core Presentation Characteristics recommendation will be to define a common set of presentation properties or attributes that:

  • can be reused in different delivery context vocabularies
  • share common semantics
  • simplify the task of interpreting these attributes when adapting or authoring content for presentation in different delivery contexts

Therefore, the scope of the Core Presentation Characteristics definitions is restricted to attributes that are 'core' for the 'presentation' of web content. A more thorough explanation of the meaning of these two terms is presented below.

Presentation characteristics are those properties that define some aspects of the way in which content may be presented to a user of an access mechanism. Presentation characteristics are directly related to the presentation model being used. For example, when rendering some HTML with CSS visual styling, CSS defines a presentation model which includes, for example, the visual area within which the presentation is to be made, and the fonts with which text can be rendered. Similarly, when requesting an image to be rendered as part of a presentation, there is a presentation model which includes the image size and resolution at which it is to be displayed. For an audio presentation of some text using a text-to-speech model, the presentation model may include the available voices with which the text can be rendered.

Core presentation characteristics are those that are relevant to almost every device using a particular presentation model. This excludes from the core any attributes that are specific to only a small subset of devices using a given presentation model.


The W3C Timed text working group has posted the first public working draft of the Timed Text (TT) Authoring Format 1.0 Use Cases and Requirements. "Timed text is textual information that is intrinsically or extrinsically associated with timing information", such as subtitles or closed captions. One of the requirements is that this will be based on XML and XSL.


The W3C Web Services Architecture Working Group has updated the working drafts of Web Services Architecture Usage Scenarios, Web Services Architecture, and Web Services Glossary.

The Web service architecture (WSA) is intended to provide a common definition of a Web service, and define its place within a larger Web services framework to guide Web services product implementers, Web services specification authors, Web services application developers, and Web services students.

The WSA provides a model and a context for understanding Web services, and a context for placing Web services specifications and technologies into relationships with each other and with other technologies outside the WSA. The WSA promotes interoperability through the definition of compatible protocols. The architecture does not impose any requirements on the implementation of services, and imposes no restriction on how services might be combined. The WSA describes both the minimal characteristics that are common to all Web services, and a number of characteristics that are needed by many, but not all, Web services.


The W3C Quality Assurance (QA) Activity has posted the second public working draft of the QA Framework: Test Guidelines. "This document defines a set of common guidelines for conformance test materials for W3C specifications."


The W3C Web Services Internationalization Task Force has published the second public working draft of Web Services Internationalization Usage Scenarios. This describes various issues that arise when using SOAP services in multi-language environments. For example, is it possible to send error messages in both English and Japanese?

Thursday, May 15, 2003

The W3C CSS working group has published two new and four updated Cascading Style Sheets Level 3 (CSS3) draft specifications:

CSS3 Generated and Replaced Content Module
This first public working draft "describes how to insert and move content around a document, in order to create footnotes, endnotes, section notes. Inserted content can also introduce counters and strings, which can be used for running headers and footers, section numbering, and lists. Finally, techniques for declaring replaced images, as well as scaling and cropping them using CSS, are described."
CSS3 Speech Module
This first public working draft defines CSS properties used when documents are read out loud. These include voice-volume, voice-balance, speak, pause-before, pause-after, pause, cue-before, cue-after, cue, voice-rate, voice-family, voice-pitch, voice-pitch-range, voice-stress, voice-duration, phonemes, @phonetic-alphabet, and interpret-as.
CSS3 Text Module
This Candidate Recommendation defines text formatting properties like text-align and white-space. Many of these were already present in CSS2, but lots of new ones have been added, especially for East Asian and bidirectional text.
CSS3 Ruby Module
This candidate recommendation describes properties for the Ruby (inline annotation) elements of HTML used in East Asian text to indicate pronunciation.
CSS3 Color Module
This candidate recommendation describes properties such as color, color-profile, and opacity.
CSS TV Profile 1.0
This second candidate recommendation "defines a subset of Cascading Style Sheets Level 2 and CSS3 Module: Color specifications tailored to the needs and constraints of TV devices." The change is that the opacity property now simply accepts a number. Tehre is no longer a priority argument.

Stefan Champailler's posted DTDDoc 0.0.6, a JavaDoc like tool for creating HTML documentation of document type definitions from embedded DTD comments. This release can display the parent of an element or attribute parent and can configure the title that is displayed on top of the index. DTDDoc is published under the GPL.

Wednesday, May 14, 2003

Late Night Software has released XSLT Tools 1.0, a free-beer AppleScript addition for Mac OS X based on Xalan-C that adds XSLT and XPath support.


Johannes Dbler has released jd.xsltc, a XSLT stylesheet compiler that supports the now defunct XSLT 1.1 working draft. This tool compilkes XSLT stylesheets into Java byte code. create a Java class from a stylesheet which is then used to transform XML documents according the rules of the original stylesheet. Dbler reports that the compiled stylesheets are 50% faster than jd.xslt. It is available for non-commercial use only.

Tuesday, May 13, 2003

James Sleeman has posted gogoXML 0.5.1, an open source, tree based class library for processing XML with PHP that focuses on ease of use.


The W3C Multimodal Interaction Working Group has updated their Multimodal Interaction Framework Note. "This document introduces the W3C Multimodal Interaction Framework, and identifies the major components for multimodal systems. Each component represents a set of related functions. The framework identifies the markup languages used to describe information required by components and for data flowing among components. The W3C Multimodal Interaction Framework describes input and output modes widely used today and can be extended to include additional modes of user input and output as they become available."

Monday, May 12, 2003

John Wilson has released Skyron, an XML data binding framework for Python. Skyron enables "an XML document to be processed in a way which is described by a text file called a recipe. Skyron recipes identify interesting data in the XML document and say what is to be done with that interesting data. A Skyron recipe contains a set of descriptions of XML elements and a set of instructions of what to do with the data available at the beginning and at the end of an XML element."

Saturday, May 10, 2003

Yann Dirson's posted version 0.99.7 of sgml2x , a DSSSL formatter for XML and SGML based on jade. In 1.0.0 --dssslproc replaces --jade, which is now deprecated, and some man pages have been added.

Friday, May 9, 2003

The first beta of Mozilla 1.4 is out. for the usual batch of platforms: Windows, Linux, Open VMS, and Mac OS X. (in fact, pretty much everything except Mac OS 9, which the Mozilla Project has effectively abandoned, much to my annoyance, especially since their last 1.2.1 release for Mac OS 9 has lots of nasty bugs). The popular open source web browser supports XML, CSS, XSLT, XHTML, HTML, DOM Level 1 and Level 2, and JavaScript. Java and Flash support are available thorugh plug-ins. New features in 1.4 include dynamic image and table resizing in Composer, smooth scrolling (disabled by default) and improvements to spam filtering as well as bug fixes addressing speed, stability, standards support and website compatibility. Bookmarks now include a root level folder, the ability to have two differently named bookmarks pointing at the same location, site icons in the Bookmark Manager and Bookmarks Sidebar, and labelled separators. New features since the alpha release include:

  • Mozilla on Windows now has support for NTLM authentication. This enables Mozilla to talk to MS web and proxy servers that are configured to use "windows integrated security".
  • Users can now specify "blank page," "home page," or "last page visited" for each of first window, new window and new tab.
  • Users can now specify default font, size and color for HTML mail compose.
  • Mozilla Mail now has CRAM-MD5 and DIGEST-MD5 AUTH support.
  • "Launch file" after downloading has been enabled for .exe files.
  • Proxy auto-config (PAC) failover has been implemented.

One feature that is still sorely lacking from this release (or any other) is the ability to load normally blocked images into a page on page by page basis. This would be useful on web sites that serve legitimate, non-ad images from other servers than the main page, soch as www.apple.com, www.officemax.com, and www.amazon.uk. This was available in Mosaic 1.0. It's shocking that it isn't available in Mozilla 10 years later.


Grzegorz Godlewski has posted PHP-Xerces 0.4, an open source XML Parser for PHP that supports DTD and W3C XML Schema validation. PHP-Xerces is published under the GPL.

Thursday, May 8, 2003

The W3C XML Protocol Working Group has published four proposed recommendations of SOAP 1.2:

The URIs for various parts of SOAP are now of the form http://www.w3.org/2003/05/soap/bindings/HTTP/, http://www.w3.org/2003/05/soap/mep/request-response/, etc. I'm a little surprised they're still month based at this late date. Otherwise I haven't been following this closely enough to say what's changed.

One thing I just noticed (Not sure if this has been in previous drafts or not. OK, I found it in the last draft. This is not new, but it was not previously spelled out nearly as clearly.) is that "Comment information items MAY appear as children and/or descendants of the [document element] element information item but not before or after that element information item". In other words SOAP messages can't contain comments in the prolog or epilog. What exactly, is the point of this? The subsetting just keeps getting more and more extreme.


The W3C XHTML working group has published a new working draft of XHTML 2.0. XHTML 2.0 is the next, backwards incompatible version of HTML that incorporates XFrames, XForms, and lots of other crunchy XML goodness. However, XLink is not yet included and may never be. (The HTML Working Group are extreme XLink skeptics.) "This version includes an early implementation of XHTML 2.0 in RELAX NG [RELAXNG], but does not include the implementations in DTD or XML Schema form." (It's interesting that even the W3C working groups are starting to prefer RELAX NG.) Otherise, it's not immediately obvious to me what's changed or added in this draft.

Wednesday, May 7, 2003

Microsoft has released service pack 2 for their MSXML and XSLT processor for Windows. This release "provides a number of security and bug fixes." The bugs fixed include standards conformance issues, memory leaks, program hangs, and incorrect behavior. However, they don't appear to have fixed the most serious and longstanding conformance issues like the incorrect handling of white space and the use of the fictional text/xsl MIME type.


Version 2.0.1 of the payware <Oxygen/> XML editor has been released. Oxygen supports XML, XSL, DTDs, and the W3C XML Schema Language. Version 2.0.1 is a bug fix release. Oxygen requires a recent Java Virtual Machine. It costs $74.

Tuesday, May 6, 2003

Michael Kay has released version 7.5 of Saxon, an experimental open source XSLT processor written in Java that supports large parts of the latest drafts of XSLT 2 and XPath 2.0. Most normal users should stick with Saxon 6.5.2 and XSLT 1.0 for the time being.


Oleg Tkachenko has poisted the first alpha of an XInclude.NET library, an open source implementation of the XInclude 1.0 Candidate Recommendation written in C# for the .NET platform. XPointers are not yet supported yet. The .NET Framework 1.0 is required.

Monday, May 5, 2003

The W3C XQuery and XSLT Working Groups have dropped another load of working drafts into the world:

XSLT 2.0 and XQuery 1.0 Serialization is a new working draft based on material that was previously part of the XSLT 2.0 spec. It describes how to convert an XPath 2.0 data model into an XML document, document fragment, HTML, plain text, or other formats.

Changes to XPath 2.0 in these drafts include:

  • Elements and attributes can be selected by their type as well as by their name. For example, element(Person, surgeon) matches all Person elements whose type is surgeon. element(*, Person) matches all elements whose type is Person, regardless of element name.
  • A "validation mode" (strict, lax, or skip) has been added to the validate expression.
  • The value comparison operators are now transitive.
  • The syntax of the "cast" and "treat" expressions has changed so that the operator and target type follow the operand expression. For example, XPath now uses 5 cast as decimal instead of the old syntax cast as decimal 5
  • Comment delimiters have been changed from {-- --} to (: :).
  • This draft uses the namespace URI http://www.w3.org/2003/05/xpath-datatypes (mapped to the prefix xdt) for the four primtive types defined in the XPath data model but not the W#C XML Schema Language.
  • Two new simple types, xdt:untypedAtomic and xdt:anyAtomicType, have been introduced.
  • New functions include round-half-to-even, zero-or-one, one-or-more, and exactly-one. In addition, the insert function is now insert-before and the document function is now the doc function.

Major changes in XSLT 2.0 since the last working draft not directly related to XPath 2.0 include

  • Output elements can be assigned complex types via [xsl:]validation and [xsl:]type attributes on xsl:element, xsl:attribute, xsl:copy, xsl:copy-of, and literal result elements.
  • XSLT instructions can construct arbitrary sequences of nodes and atomic values using xsl:variable and the new xsl:sequence instruction. The xsl:result element has been eliminated.
  • format-dateTime, format-date, and format-time functions
  • xsl:text once again works like it did in XSLT 1.0. It can only contain a single text node, and no nested instructions or literal result elements.
  • A new current-grouping-key function enables the same value to be added to zero or more groups.
  • The [xsl:]default-xpath-namespace attribute has been renamed [xsl:]xpath-default-namespace.
  • It is now a compile-time error for xsl:call-template to supply a parameter whose name does not match the name of any parameter declared in the called template. In XSLT 1.0 the extra parameter was simply ignored.
  • An xsl:next-match instruction can find the second best match (second highest priority) for a template rule.
  • xsl:character-map replaces disable-output-escaping
  • New options for the system-property function provide information about the conformance levels and features offered by the processor.

XQuery changes, beyond those in XPath 2.0, include

  • The concept of a "module" has been introduced. A module may contain a library of functions and variables that can be imported by other modules. An "import module" clause has been introduced into the Query Prolog, which is now simply called Prolog.
  • Element constructors automatically validate the newly constructed element.
  • XQuery now permits two kinds of implementation-defined extensions, called "pragmas" and "must-understand extensions". Implementations supporting these extensions must provide a "flagger" that recognizes whether a query uses a language extension.
  • Global variable declarations have been added to the Prolog.
  • A version number declaration has been added to the Prolog.
Sunday, May 4, 2003

ActiveState has released Visual XSLT 1.8, a $295 payware XSLT development plug-in for Visual Studio .NET. It includes an XSLT editor, XSLT debugger, template browser, and more. Version 1.8 is now compliant with Visual Studio .NET 2003.


Nitesh Ambastha and Tahir Hashmi have posted XQueeze 0.2, yet another attempt to define a smaller, binary compressed form of XML. Xqueeze replaces "the long and descriptive identifiers that are described in XML grammar specifications (DTD, Schema etc.) with small bit-squences or symbols" defined in a per-document type data dictionary. Unlike some other XML compression formats, XQueeze's XQML can be directly generated and parsed (though with not standard XML tools).

If I'm reading the XQML spec right, they've made the very common mistake of assuming that documents actually adhere to their advertised DTDs. They don't seem to provide any mechanism for handling the case where an undeclared element appears, nor does XQueeze seem able to handle the very common case of documents without DTDs (though the latter is announced as part of plans for a future release.) XQueeze is written in C, and published under the GPL.

Saturday, May 3, 2003

I've posted a preliminary Recommended Reading list from Effective XML. This lists other sources of information about how to use XML well, as opposed to merely its syntax and processing, so I have not included more traditional tutorial books about XML such as Processing XML with Java or the underlying specifications. And I did deliberately omit the draft U.S. Federal Government guidelines for XML because in many cases I think it gives bad advice. (I am hearing some scuttlebutt that the final version may be significantly improved from the draft, but until I actually see the final version I can't recommend it.) Thus, the list is relatively short. There hasn't been a huge amount of work in this space yet. If anyone has additional suggestions for this list, I'd appreciate hearing about them.


Version 0.8 of Pyana, a Python interface to the Xalan C XSLT processor, has been posted on SourceForge.


Andy Clark has posted a new release of his CyberNeko Tools for the Xerces Native Interface (NekoXNI). This release fixes assorted bugs, and supports Xerces 2.4.0.


IBM's alphaWorks has updated their XML Integrator, a "tool for bi-directional data conversion between XML and structured data formats such as relational or LDAP data. This tool externalizes the specification of the mapping between XML and relational databases, and it replaces the programming effort by the simpler effort of writing a script that describes the relationships between the XML constructs and the corresponding RDBMS constructs. XI can be used as a stand-alone utility, or it can be integrated as a library in other applications." This update provides the source code for Xi.java "to help users who wish to integrate XI into their applications instead of running it stand-alone."


AlphaWorks has also updated ToXgene, a "template-based generator for complex, semantically-correlated collections of XML documents. The data generation process in ToXgene is based on a conceptual description of the data to be generated (the templates). This tool is intended for cases in which the structure of the data to be generated is known, the data is required to conform to that structure, and multiple collections of documents, with varying structures, sizes and complexities, can easily be generated." This version adds a persistent object manager.

Friday, May 2, 2003

The W3C Web Content Accessibility Guidelines Working Group has posted the second public working draft of Web Content Accessibility Guidelines 2.0. Quoting from the introduction:

This document outlines design principles for creating accessible Web content. When these principles are ignored, individuals with disabilities may not be able to access the content at all, or they may be able to do so only with great difficulty. When these principles are employed, they also make Web content accessible to a variety of Web-enabled devices, such as phones, handheld devices, kiosks, network appliances, etc. By making content accessible to a variety of devices, that content will also be accessible to people in a variety of situations.

The design principles in this document represent broad concepts that apply to all Web-based content. They are not specific to HTML, XML, or any other technology. This approach was taken so that the design principles could be applied to a variety of situations and technologies, including those that do not yet exist.


Alexandre Brillant has released FastParser 1.5, a $50 shareware, non-validating, SAX parser for Java. Version 1.5 adds support for Java 1.1.

Brillant claims this parser is faster than Xerces and Crimson (which are known not to be the fastest parsers out there). However, his benchmarks only test one file, and it's not clear from his result whether FastParser was used in a mode that doesn't perform full well-formedness checking.


Thursday, May 1, 2003

Here are the preliminary cover designs for Effective XML. Let me know what you think:

Effective XML Cover design
with olive porphyria Effective XML
cover design with moon

The W3C has released Amaya 8.0, their open source testbed web browser and authoring tool for Solaris, Linux, and Windows that supports HTML, XHTML, XML, CSS, MathML, and SVG. New features in version 8.0 include menu access keys in Windows and improved support for SVG, SMIL, CSS and MathML.


Decisionsoft has released Pathan, an open source add-on for Xerces-C that can evaluate XPath 1.0 expressions to select DOM nodes.


Antenna House, Inc has released version 2.5 of its XSL Formatter. This release adds several proprietary extensions to XSL-FO, supports Thai line breaks, embeds EPS files, and fixes some bugs. Pricing runs from $1250 for a single user version to $5000 for a server version.

Wednesday, April 30, 2003

The W3C SVG Working Group has posted the second public working draft of Scalable Vector Graphics (SVG) 1.2. According to Chris Lilley, "There is a lot more technical content compared to the previous draft; a lot of work has gone into this one. This draft has real implementable (and in some cases, implemented) syntax." Suggested new features include:

  • Text wrapping inside shapes
  • XForms support
  • XML Events support
  • More SMIL features possibly including audio, video, transitions and enhanced timing controls.
  • Rendering Arbitrary XML inside SVG documents using style sheets
  • A printing profile
  • Enhanced Alpha Compositing
  • Z-indexes not based on document order
  • Streaming enhancements
  • The solidColor element is a paint server that provides a single color with opacity. It can be referenced like the other paint servers (gradients and patterns).
  • Background fills
  • Alternate content based on display resolutions
  • Keyboard navigation between picture elements
  • DOM access to images
  • Conversion of mouse coordinates to the corresponding user space coordinates
  • DOM Level 3 Events
  • A standard SVGWindow interface

Tuesday, April 29, 2003

Jason Hunter has posted beta 9 of JDOM, the popular open source, tree-based API for processing XML with Java. This beta works around a memory leak in the StringBuffer class in some recent versions of Java, distinguishes IOExceptions from JDOMExceptions, and makes various other small changes and additions here and there, and fixes numerous bugs. Aside from the additional exception thrown by the parse() method, little code should need to be changed to upgrade to beta 9.


The XML Apache Project has released Xalan-C++ 1.5, an open source XSLT processor written in standard C++. Version 1.5 fixes bugs, supports C++ namespaces, and expands support for the EXSLT library. On Windows, Xalan-C++ is now packaged as a single DLL. Xerces-C++ 2.2.0 is suggested for this release.


Monday, April 28, 2003

I've posted Chapter 47 of Effective XML, Catalog Common Resources. Catalogs are as little known but very effective tool for indirectly locating DTDs, stylesheets, and other suport documents. This chapter explains how to use catalog files to resolve PUBLIC IDs and cache local copies of common documents.


The Apache Commons Team has released Digester 1.5, a SAX-based XML to object mapper, designed primarily for parsing XML configuration files though it has other uses too. Digester is configured through an XML to Java object mapping module, which triggers actions whenever a pattern of nested XML elements is recognized. Version 1.5 fixes some backwards compatibility issues that cropped up in 1.4, fixes a few bugs, and adds support for regular expression in patterns.

Sunday, April 27, 2003

Dennis Sosnoski has posted the first beta of JiBX, yet another open source framework for binding XML data to Java objects using your own class structures. It falls into the custom-binding document camp as opposed to the schema driven binding frameworks like JaxMe and JAXB. I haven't looked at JiBX in detail, but in general I do find the APIs based on custom bindgin documents to be a lot more flexible and potentially useful than those based on a schema. Quoting from the JiBX web site,

The JiBX framework handles all the details of converting your data to and from XML based on your instructions. JiBX is designed to perform the translation between internal data structures and XML with very high efficiency, but still allows you a high degree of control over the translation process.

How does it manage this? JiBX uses binding definition documents to define the rules for how your Java objects are converted to or from XML (the binding). At some point after you've compiled your source code into class files you execute the first part of the JiBX framework, the binding compiler. This compiler enhances binary class files produced by the Java compiler, adding code to handle converting instances of the classes to or from XML. After running the binidng compiler you can continue the normal steps you take in assembling your application (such as building jar files, etc.).

The second part of the JiBX framework is the binding runtime. The enhanced class files generated by the binding compiler use this runtime component both for actually building objects from an XML input document (called unmarshalling, in data binding terms) and for generating an XML output document from objects (called marshalling). The runtime uses a parser implementing the XMLPull API for handling input documents, but is otherwise self-contained.

Saturday, April 26, 2003

Jochen Wiedmann's released JaxMe 1.58, "yet another open source Java/XML binding tool in the style of Castor or Zeus" that sits on top of SAX2. JaxMe provides code generators that read a W3C XML schema and generate code for parsing conformant XML documents into corresponding Java objects, saving those objects into a database or reading those Java objects from a database and converting them into XML. JaxMe supports SQL databases and Tamino. It includes an integrated application framework and a generator for EJB entity beans with bean managed persistence (BMP). It's based on a reduced subset of the W3C XML schema language that does not support choices, references, or recursion. This release adds support for xs:maxLength, xs:minLength, xs:enumeration, xs:pattern, xs:include, and xs:anyType as well as BLOB and CLOB handling when working with a relational database.

Friday, April 25, 2003

I've posted Chapter 50 of from Effective XML. Item 50, Compress if space is a problem, suggests using standard compression algorithms such as gzip or zip to store XML documents.


Daniel Veillard has released version 2.5.7 of libxml2, the XML C library for Gnome. Version 2.5.7 adds Relax-NG streaming validation, large file support, thread support by default, performance and low memory handling work, xmlReader to DOM extensions, and fixes some bugs.

Thursday, April 24, 2003

I've posted Chapter 35 of from Effective XML. Item 35, Navigate with XPath, explains when, why and how to use XPath from within your own programs to write much more robust software that is only loosely coupled to document structures.

Wednesday, April 23, 2003

I've posted Chapter 31 of from Effective XML. Item 31, Program to Standard APIs, explains how to write code that can be run on multiple parsers to find the best performance.

I've also posted the first draft of the acknowledgements page. It's not too late to see your name in print. All you have to do is be the first one spot a mistake and send in a correction, or make a useful suggestion. All comments are appreciated!


The W3C Guidelines, Education & Outreach Task Force (GEO) of the W3C Internationalization Working Group (I18N WG) has published the first public working draft of Framework Document for i18n Guidelines 1.0. This "describes plans for producing documents that provide guidelines on internationalization of W3C technologies"

Tuesday, April 22, 2003

I've posted Chapter 15 from Effective XML. Item 15, Build on structures, not syntax elaborates for the first time in public my five layer stack for XML processing (inspired by the five layer TCP/IP network stack). Thinking of XML as a stack of layers, each of which depends only on the stack below it, makes the answers to many frequent questions very obvious, clears up a lot of the confusion in the XML community, and points the way to more effective API and application design in the future. In brief the XML stack looks like this:

Semantics: Objects and data structures built from the information in the XML document; customized for the local environment
Structures: elements, attributes, text nodes, processing instructions
Syntax: tags, entity references, character references, CDATA sections, PCDATA
Lexical: Unicode characters in a sequence
Binary: Bytes in a sequence

This model has been implicit in a lot of XML work. For instance, in the XML 1.0 specification the BNF grammar mostly describes the syntax while the well-formedness constraints focus on the structure. In SAX, the core ContentHandler class provides the structure, while the optional LexicalHandler class reports the syntax. However, I don't think this stack has really been spelled out anywhere like this before, and most descriptions of XML processing leave out at least one of these layers, typically by improperly merging the semantic layer with the structure layer or the structure layer with the syntax layer. For instance, there was a recent debate on xml-dev about whether or not XML was a tree, or if that was just one way of viewing it. I think the real answer is that it is a tree if you're looking at the structure layer, but it's a linear sequence of syntax items (that is, not a tree) at the syntax layer. And when you get to the semantic layer, anything goes because it's all local. You can have a tree or a graph or any other data structure you care to instantiate from the document.

Different tasks need to operate at different layers, and you need different APIs for these different layers; just as in networking you would use a different layer for writing a traceroute program (based on the IP layer) and a telnet program (based on the TCP layer). For example, most programs work on the structure layer, but a source code level XML editor would probably want to plug into the syntax layer instead. Data binding APIs expose the semantic layer, but most other APIs expose the structure layer. Other APIs like SAX, DOM, XOM, and JDOM mostly expose the structure layer. However, in practice, there's been a lot of confusion because many APIs mix up pieces of the syntax layer with pieces of the structure layer. The result has been as ugly as one would get by mixing bits of IP into a TCP layer networking API. All comments are appreciated.


ConnectTel's XQX downloads with a new version 1.003-Beta. XQX allows access to Relational Databases using a XML interface. XQX retrieves data from a SQL database and transforms the result into a standard XML format that can be transformed into whatever format you want using XSLT. (As far as I know this approach was pioneered by FileMaker 6, and strikes me as very sensible.) XQX has can operate as a remote web service using SOAP or as a local library that can access a variety of relational databases. Supported RDBMSs include Microsoft SQL Server, Sybase, DB2, Informix, InterBase, Centura, MySQL, and PostgreSQL as well as anything with an ODBC driver.


From the "You have got to be kidding me department" comes Martin Klang's YAPP XSLT 1.0, yet another parser parser. This is a "a lexical scanner and recursive descent parser generator, implemented in XSLT. No language extensions or non-standard features are used apart from the nodeset() function. Grammars are expressed in XML form and transformed by the generator stylesheet into another XSLT. A lexical scanner may also be generated from the same grammar."


Jochen Wiedmann's released JaxMe 1.5.7, "yet another open source Java/XML binding tool in the style of Castor or Zeus" that sits on top of SAX2. JaxMe provides code generators that read a W3C XML schema and generate code for parsing conformant XML documents into corresponding Java objects, saving those objects into a database or reading those Java objects from a database and converting them into XML. JaxMe supports SQL databases and Tamino. It includes an integrated application framework and a generator for EJB entity beans with bean managed persistence (BMP). It's based on a reduced subset of the W3C XML schema language that does not support choices, references, or recursion. This release adds support for the xsd:base64Binary type.

Monday, April 21, 2003

I've posted the first public draft of Chapter 33 of Effective XML, Choose DOM for Standards Support. This short chapter explains when and when not to use DOM for processing XML/ All comments are appreciated.


The W3C XKMS Working Group Working Group has posted last call working drafts of XML Key Management Specification (XKMS) and XML Key Management Specification (XKMS) Bindings. XKMS is a set of "protocols for distributing and registering public keys, suitable for use in conjunction with the standard for XML Signatures [XML-SIG] defined by the World Wide Web Consortium (W3C) and the Internet Engineering Task Force (IETF) and companion standard for XML encryption [XML-ENC]. The XML Key Management Specification (XKMS) comprises two parts -- the XML Key Information Service Specification (X-KISS) and the XML Key Registration Service Specification (X-KRSS). These protocols do not require any particular underlying public key infrastructure (such as X.509) but are designed to be compatible with such infrastructures." Comments are due by May 23.

Sunday, April 20, 2003

I've posted the first public drafts of the preface and Chapter 14 of Effective XML. Chapter 14, Allow All XML Syntax, explains the difference between syntax and structure (a distinction which will be expanded on in Chapter 15) and suggests that devlopers focus their attention on the structure rather than the syntax. All comments are appreciated.


Version 1.2.10 of Galeon, Gnome's open source web browser for Linux based on the Mozilla Gecko engine, has been released. Galeon suports direct display of XML documents with attached CSS stylesheets. Version 1.3.4 (the unstable version) has also been posted. This one's based on Mozilla 1.4.

Saturday, April 19, 2003

I've posted the first public draft of Chapter 3 of Effective XML, Stay with XML 1.0. This forward-looking chapter explains the differences between XML 1.0 and XML 1.1, and shows why very few people have any use for XML 1.1. This chapter uses some characters that are probably not available on most people's systems. All comments are appreciated. I've already received a number of helpful comments on the zeroth chapter I posted Wednesday.


The community review draft of Java Specification Request (JSR) 173, Streaming API for XML (StAX), has been posted. This describes a standard pull API for processing XML with Java. but currently it's only available to people who've submitted to the Java Community Process's restrictions on open discussion. Anybody who feels like leaking a draft, you know my e-mail address. :-)


The Unicode Consortium has released Unicode 4.0. "1,226 new character assignments were made to the Unicode Standard, Version 4.0 (over and above what was in Unicode 3.2). These additions include currency symbols, additional Latin and Cyrillic characters, the Limbu and Tai Le scripts; Yijing Hexagram symbols, Khmer symbols, Linear B syllables and ideograms, Cypriot, Ugaritic, and a new block of variation selectors (especially for future CJK variants). Double diacritic characters were added for dictionary use." The printed book will not be available until September, but the data file can be downloaded now.


Friday, April 18, 2003

I've posted the first public draft of Chapter 1 of Effective XML, Include an XML Declaration. All comments are appreciated. I've already received a number of helpful comments on the zeroth chapter I posted Wednesday.

I do know the HTML in these chapters is less than ideal. Right now I'm just posting what OpenOffice Writer saves. I will eventually have to figure out ways to clean this up, make it look a lot prettier, add links between the chapters, and so forth. However, at the moment my focus is on the content rather than the formatting.


Uche Ogbuji has posted the first alpha of 4Suite 1.0, an open source "platform for XML and RDF processing, with base libraries and a server framework. It is implemented in Python and C, and provides Python and XSLT APIs, Web and command line interfaces." Supported standards include DOM, SAX, RDF, XSLT, XInclude, XPointer, XLink and XPath.


Dave Beckett has posted the Raptor RDF Parser Toolkit 0.9.10, an open source C library for parsing the RDF/XML and N-Triples Resource Description Framework formats. It uses expat or libxml2 as the underlying XML parser. Raptor is pubished under the LGPL.

Thursday, April 17, 2003

I am very pleased to report that the Fibonacci web services on elharo.com are running again at the original URLs published in Processing XML with Java. Thanks to everyone who wrote in with suggestions. What I ultimately had to do was shut down the Sun Cobalt Qube and replace it with a stock Red Hat Linux box running the stock versions of Apache and Tomcat. I never could get the Qube to a point where it could run standard programs instead of Sun's special versions that didn't actually let me put the servlets at the URLs where I needed to put them.

There are a few interesting lessons and thoughts I took away from this experience. All the difficulty I had in getting these services up and running in a hopefully stable configuration came from HTTP, specifically from integrating them with an existing web server. Generating and processing the XML was trivial by comparison. It would have been easier to write a simple custom server that spoke a special purpose protocol which both served and recieved XML than it was to tunnel this stuff over HTTP.

All of the web services frameworks I've seen address the easy question: how to generate and process XML. They don't seem to make any progress on what is in my experience the much tougher question: how does one integrate this into a web server? I think they might be solving the wrong problem. Some of this stuff, SOAP and XML-RPC, probably shouldn't have been tied to port 80 in the first place.


FileMaker, Inc. has released FileMaker Developer 6.0v4, a minor update to the non-SQL $499 payware database for Mac OS and Windows. FileMaker can export and import data as XML by using XSL stylesheets to define mappings. This release fixes a hnumber of bugs in the XML support.

Wednesday, April 16, 2003

For the last eight months or so, I've been very cagey when asked about my next book, but I'm now in the last stretch of review, and I finally feel like it's in decent enough shape to pull back the curtain and let the world have a look. The book is Effective XML: 50 Specific Ways to Improve Your Applications and Documents, and it follows in the footsteps of Scott Meyers' Effective C++ and Joshua Bloch's Effective Java. Like those books it will be published by Addison-Wesley. The current list of chapters/principles is

  1. Define your terms
  2. Include an XML declaration
  3. Markup with ASCII if possible
  4. Stay with XML 1.0
  5. Use standard entity references
  6. Comment DTDs liberally
  7. Name elements with camel case
  8. Parameterize DTDs
  9. Modularize DTDs
  10. Distinguish text from markup
  11. White space matters
  12. Make structure explicit through markup
  13. Store metadata in attributes
  14. Remember mixed content
  15. Allow all XML syntax
  16. Build on top of structures, not syntax
  17. Prefer direct attribute values to unparsed entities and notations
  18. Use processing instructions for process-specific content
  19. Include all information in instance documents
  20. Encode binary data using quoted printable and/or Base64
  21. Use namespaces for modularity and extensibility
  22. Rely on namespace URIs, not prefixes
  23. Don't use namespace prefixes in element content and attribute values
  24. Reuse XHTML for generic narrative content
  25. Choose the right schema language for the job
  26. Pretend there's no such thing as the PSVI
  27. Version documents, schemas, and stylesheets
  28. Markup according to meaning
  29. Use only what you need
  30. Always use a parser
  31. Layer Functionality
  32. Program to standard APIs
  33. Choose SAX for computer efficiency
  34. Choose DOM for standards support
  35. Read the complete DTD
  36. Navigate with XPath
  37. Serialize XML with XML
  38. Validate inside your program with schemas
  39. Write documents in Unicode
  40. Parameterize XSLT style sheets
  41. Avoid Vendor Lockin
  42. Hang on to your relational database
  43. Document Namespaces with RDDL
  44. Preprocess XSLT on the server side
  45. Serve XML+CSS to the client
  46. Pick the correct MIME media type
  47. Tidy Up Your HTML
  48. Catalog common resources
  49. Verify documents with XML digital signatures
  50. Hide confidential data with XML encryption
  51. Compress if space is a problem

This book has been a major undertaking for me. Although it's relatively short (probably under 400 pages) a lot of effort went into those pages, and it's not done yet. I've probably put more effort and time into this book than I did into my typical 1200 page behemoth; but I think it's been worth it. This is going to be an exciting book, and probably a little controversial. Some major efforts in the XML community get called out in various chapters as examples of exactly how not to design good XML, but I think I can back up my claims, and I hope that future applications will learn enough from this book not to make the same mistakes.

I'm in the final stages of review. Comments are still useful and appreciated. A few of those may yet change as the book is prepared for publication. The "zeroth" chapter, Define your terms, is online now. I'll be posting more for your perusal, comment, and calumny over the next couple of weeks.

Tuesday, April 15, 2003

The XML Apache Project has released Xalan-Java 2.5, an open source XSLT processor. Most of the changes in this release appear to be behind the scenes, or in the deep API. Developers using Xalan for basic transformations probably won't notice any differences.

Monday, April 14, 2003

Apple has posted the second beta of their Safari web browser for Mac OS X. Safari supports direct display of XML documents with CSS stylesheets but does not support XSLT. New features in this beta include:

  • Tabbed browsing
  • AutoFill forms & passwords
  • Privacy reset
  • Japanese, French and German locations
  • Import of Netscape and Mozilla bookmarks
  • Increased standards compatibility
  • Improved AppleScript support

Download it through Software Update.

Saturday, April 12, 2003

Brownpot Software has released XDataFinder, an open source "Java swing application for browsing and querying XML files and native XML database, such as Apache Xindice database. XDataFinder interacts with user to compose XPath statement based on the XML Schema associated with the XML file or database."


The W3C Math Working Group has published the last call working draft of the second edition of MathML 2.0. "The preparation of a Second Edition of the MathML 2.0 Specification allows the revision of that document to provide a coherent whole containing corrections to all the known errata and clarifications of some smaller issues that proved problematic. It is not the occasion for any fundamental changes in the language MathML 2.0."

Friday, April 11, 2003

Version 1.4.1 of Jaxe, an open source (GPL) XML GUI editor written in Java 1.3, has been released. It is configurable with an XML schema and a configuration file, supports validation at element insertion, is customisable via Java modules, and can use XSLT to display documents as XHTML. A little unusually, the configuration files are written in French--ENCODAGE instead of ENCODING, BALISE instead of TAG, etc. The user interface has been localized into French, English, and German. Version 1.4.1 is a bug fix release.


Brendan Macmillan's posted version 0.8.3 of Java Serialization for XML (JSX) 2, a library for converting Java objects into streams of XML and reading the objects back from the streams. To use it, replace ObjectOutputStream with JSX.ObjectWriter and ObjectInputStream with JSX.ObjectReader. This release adds support for the Externalizable interface.


IBM's alphaWorks has released the XML Forms Package, a Java toolkit for working with XForms that includes both server-side and client components. The server side data model component is "a set of Java APIs for creating, accessing, and modifying XForms data models. This package also includes a JSP tag library that provides a set of tags for use inside JSPs. The tag library interfaces with the XForms data model component APIs." The client component includes an XForms processor control plug-in for Internet Explorer and an XForms "compiler" written in Java that converts XHTML with XForms into HTML+JavaScript, XML, and DOM for use with other modern browsers like Mozilla.


Opera Software ASA has released version 7.10 of their namesake web browser for Windows that supports direct display of XML with attached CSS style sheets. New features in this release include local web page notes fast forward and rewind buttons that guess where you want to go next, and slide shows for photos.

They have also posted the first beta of Opera 7.10 for Linux, the first 7.x version of Opera for the platform. Opera is $39 payware.


The OpenOffice Project has released OpenOffice 1.0.3, a bug fix release of the open source office suite for Linux and Windows that saves all its files as zipped XML.

Thursday, April 10, 2003

I've posted version 1.0d11 of XOM, my tree-based API for processing XML with Java. XOM strives for maximum simplicity and absolute correctness. The new feature in this release is an ANT build file. This should make it much easier to compile XOM from source. ANT is not included though. You'll have to download and install it separately.

There are no API-level changes in this release. All code that ran before should still run. This release does a few assorted bugs reported by users. Not surprisingly these all appeared in the Builder and Serializer classes, which out of all the classes in XOM are the least well-covered by unit tests. I've expanded the unit tests to catch these and related bugs. The unit tests all pass, assuming you use a non-buggy SAX2 parser. However, if you run the JUnit GUI from the ANT build file, some confusing class loader issues cause the more-buggy Crimson to be loaded instead of the less-buggy Xerces. This breaks four unit tests. Everything should pass if you run the tests directly instead of from ANT. (That is, type "java -Xmx96m junit.swingui.TestRunner nu.xom.tests.XOMTests" instead of "ant testui".) If anyone can explain to me how I might fix this, I'd appreciate it.


Hedzer Westra has released XMill 0.8, a special purpose compression tool for XML data. It has been established that XML-based file formats are normally smaller than the corresponding binary formats, and that after compression they range from a little smaller to a little bigger than the compressed binary. However, XMill "is based on a regrouping strategy that leverages the effect of highly-efficient compression techniques in compressors such as gzip. XMill groups XML text strings with respect to their meaning and exploits similarities between those text strings for compression. Hence, XMill typically achieves much better compression rates than conventional compressors such as gzip."

This appears to be a fork of the original AT&T XMill. New features in this fork include:

  • Repackaging as a thread-safe library with a C++ API instead of just a command line application
  • The general purpose compressor (gzip, bzip2 or ppmdi) can be chosen at run-time.
  • Special purpose compressors for base64 encoded data and arbitrary-base, and arbitrary length numbers. (I consider this cheating ince it moves information from the documents into the code. The user has to tell the compressor which elements contain such information.)
  • Unit tests
  • An integrity checker to test the integrity of XMI files.

XMill is written in C++, and is published under a BSD license.


Wolfgang Meier of the Darmstadt University of Technology has posted version 0.9.1 of eXist, an open source native XML database that supports fulltext search. XML can be stored in either the internal, native XML-DB or an external relational database. The search engine has been designed to provide fast XPath queries, using indexes for all element, text and attribute nodes. The server is accessible through HTTP and XML-RPC interfaces and supports the XML:DB API for Java programming. Besides bug fixes, new features in version 0.9.1 include the highlighting of search terms in query results, simpler database startup, access to index information, new Cocoon examples, and an IzPack-based installer. eXist is published under the LGPL.

Wednesday, April 9, 2003

Peter J. Jones has posted xmlwrapp 0.4.1, a C++ library for working with XML built on top of Daniel Veillard's libxml2. This release fixes bugs and adds a few new functions. xmlwrapp is published under a BSD license.

Tuesday, April 8, 2003

Topologi has released version 1.1.3 of the Collaborative Markup Editor 1.1.1 (CME), a $AUS105 payware source level, non-tree XML editor. CME is written in Java and runs on Mac OS X, Windows, and Linux. New features in this release include:

  • New button bar and menu arrangement, for more convenient validation and preview
  • Complete help files
  • See whether a file is currently being edited by another member of your workgroup
  • List and sort validation results
  • Check the path of the current element
  • Improved usability for low-vision users
Monday, April 7, 2003

The first beta of Xerlin 1.3, an open source XML Editor written in Java, has been posted. Users can extend the application via custom editor interfaces for specific DTDs. New features in version 1.3 include XML Schema support, WebDAV capabilities, and various user interface enhancements. Java 1.2 or later is required.


Recently I've noticed a few "unannounced" XML and Unicode features in the latest Mozillas:

  • Mozilla 1.3 and later now has a syntax colored, source code tree view of unstyled XML documents, sort of like the one Internet Explorer has had for years.
  • Mozilla 1.3 and later can display Unicode characters with code points beyond 65,535 such as the F clef: 𝅘𝅥. I haven't verified this on anything except Mac OS X yet though.
  • I haven't personally verified this one yet, but I've got it on reliable authority that the new Mozilla 1.4 alpha adds XPointer support for the element(), xmlns(), fixptr(), and xpath1() schemes. It does not yet support the xpointer() scheme.

XML Benchmark 1.1 is a C/C++/Java toolset for benchmarking XML parsers including libxml2, Xerces, Oracle XDK, Expat, RXP, QT, and Crimson. Benchmarks include parsing (native, SAX, DOM), DOM manipulation, schema validation, XSL transformation, and XML signature and encryption. I've learned to treat benchmarks with a 20-pound bag of rock salt until proven otherwise. However, this product gets at least one thing right. It lets you plug in "Any valid XML file" so you can test parsers on the kind of documents you're interested in rather than on whatever the benchmark vendor has. Most parsers exhibit wildly varying performance characteristics depending on the type of XML document (large or small, record-like or narrative, many attributes or few attributes, etc.). It's not clear whether or not this parser can test well-formed but invalid documents.

Sunday, April 6, 2003

Daniel Veillard's released version 2.5.6 of libxml2, the XML C library for Gnome and version 1.0.29 of libxslt, the GNOME XSLT library for C and C++. These releases fix assorted bugs and make a few speedups and portability improvements.


Brendan Macmillan's Java Serialization for XML (JSX) 2 0.8.0 can convert Java objects into streams of XML and read the objects back from the streams. To use it, replace ObjectOutputStream with JSX.ObjectWriter and ObjectInputStream with JSX.ObjectReader. This release adds suport for the Externalizable interface.


Jaxe 1.4 is an open source (GPL) XML GUI editor written in Java 1.3. It is configurable with an XML schema and a configuration file, supports validation at element insertion, is customisable via Java modules, and can use XSLT to display documents as XHTML. A little unusually, the configuration files are written in French--ENCODAGE instead of ENCODING, BALISE instead of TAG, etc. The user interface has been localized into French, English, and German.


Alexandre Brillant has released FastParser 1.4, a $50 shareware, non-validating, SAX parser for Java. He claims this parser is faster than Xerces and Crimson (which are known not to be the fastest parsers out there). However, the benchmarks only test one file, and it's not clear from his result whether FastParser was used in a mode that doesn't perform full well-formedness checking.


Sun has posted the first beta of Java 1.4.2 (Java 2 Software Development Kit 1.4.2) for Linux, Windows, and Solaris. Changes relevant to XML include

  • The bundled Xalan XSLT processor is upgraded to version 2.4.1.
  • Various bugs have been fixed in the bundled Crimson parser. (They still haven't upgraded to Xerces.)
  • An entityExpansionLimit Java system property enables applications to limit the maximum number of entity expansions. The parser throws a fatal error once it has reached the entity expansion limit. This prevents the billion-laughs attack.
  • If set the http://apache.org/xml/features/disallow-doctype-decl SAX property causes a fatal error to be thrown if an XML document contains a DOCTYPE declaration.
Saturday, April 5, 2003

The OpenOffice Project has posted the first beta of OpenOffice 1.1, an open source office suite for Linux and Windows that saves all its files as zipped XML. New features in this release include import and export of PDF, Macromedia Flash, DocBook, several PDA Office file formats, flat XML and XHTML and complex text layout for languages such as Thai, Hindi, Arabic, and Hebrew.


The OpenOffice Software Development Kit has also been released. The kit provides tools and documentation for programming OpenOffice.org extensions (UNO components).


IBM has updated the Informix XSLT DataBlade, an Informix database module that creates new SQL functions (which are called user-defined routines (UDRs)) that transform documents from one format to another using XSLT style sheets and libxslt (the Gnome C XSLT library). This release adds HP-UX 32-bit support.

Friday, April 4, 2003

The W3C Voice Browser Working Group has published a new public working draft of Semantic Interpretation for Speech Recognition. According to the abstract,

This document defines the process of Semantic Interpretation for Speech Recognition and the syntax and semantics of semantic interpretation tags that can be added to speech recognition grammars to compute information to return to an application on the basis of rules and tokens that were matched by the speech recognizer. In particular, it defines the syntax and semantics of the contents of Tags in the Speech Recognition Grammar Specification.

Semantic Interpretation may be useful in combination with other specifications, such as the Stochastic Language Models (N-Gram) Specification, but their use with N-grams has not yet been studied.

The results of semantic interpretation are describing the meaning of a natural language utterance. The current specification represents this information as an EcmaScript object, and defines a mechanism to serialize the result into XML. The W3C Multimodal Interaction Activity is defining a data format (EMMA) for representing information contained in user utterances, and has published the requirements for this data format (EMMA Requirements). It is believed that semantic interpretation will be able to produce results that can be included in EMMA.

Thursday, April 3, 2003

Version 2.0 of the payware <Oxygen/> XML editor has been released. Oxygen supports XML, XSL, DTDs, and the W3C XML Schema Language. Version 2.0 adds tree-based editing, spell-checking, XML Catalog support, soft-wrapping of text, syntax highlighting for Java, C, C++, SQL, PHP, and Perl, and more. Oxygen requires a recent Java Virtual Machine. It costs $74.


The W3C Web Ontology Working Group has posted six last call working drafts covering various aspects of the Web Ontology Language (OWL):

Quoting from the overview document,

The OWL Web Ontology Language is designed for use by applications that need to process the content of information instead of just presenting information to humans. OWL facilitates greater machine readability of Web content than that supported by XML, RDF, and RDF Schema by providing additional vocabulary along with a formal semantics. OWL has three increasingly-expressive sublanguages: OWL Lite, OWL DL, and OWL Full.

Comments on all these are due by May 9.

Wednesday, April 2, 2003

The first alpha of Mozilla 1.4 has been posted for the usual batch of platforms: Windows, Linux, Open VMS, AIX, Solaris, Irix, HP/UX, and Mac OS X. (in fact, pretty much everything except Mac OS 9, which the Mozilla Project has effectively abandoned). The popular open source web browser supports XML, CSS, XSLT, XHTML, HTML, DOM Level 1 and Level 2, and JavaScript. Java and Flash support are available thorugh plug-ins. New features include dynamic image and table resizing in Composer, smooth scrolling (disabled by default) and improvements to spam filtering as well as bug fixes addressing speed, stability, standards support and website compatibility. Bookmarks now include a root level folder, the ability to have two differently named bookmarks pointing at the same location, site icons in the Bookmark Manager and Bookmarks Sidebar, and labelled separators.


The W3C DOM Working Group has posted the Candidate Recommendation of Document Object Model (DOM) Level 3 XPath Specification. At first glance the changes since the last call working draft seem quite minor, but I need to read it more closely. Now I have to update Chapter 16 of Processing XML with Java again. Hmmm, maybe not. I've now compared the Java language bindings from the previous working draft and this one, and I couldn't find any differences at all. All the changes seem to be in the spec prose, not the actual interfaces. I have to say I'm not very thrilled by this API, but it's no worse than the rest of DOM.


The W3C Document Object Model Working Group has posted the last call working draft of the Document Object Model (DOM) Level 3 Events Specification. According to the abstract,

This specification defines the Document Object Model Events Level 3, a generic platform- and language-neutral event system which allows registration of event handlers, describes event flow through a tree structure, and provides basic contextual information for each event. The Document Object Model Events Level 3 builds on the Document Object Model Events Level 2
Tuesday, April 1, 2003

Daniel Veillard's released version 2.5.5 of libxml2, the XML C library for Gnome. This release now includes full RELAX NG support and fixes a number of bugs in URIs, validation, XPath, xmlReader, and the HTML parser.


Version 0.8.2 of TM4J , an open source Java toolkit for parsing, manipulating and exporting topic map data, has been released. "TM4J supports import of XTM and LTM topic map interchange formats; a complete data model with a variety of persistence mechanisms; an implementation of the tolog topic map query language; and a collection of command-line tools and programming utility classes to make it easy to work with topic map data in an application development environment."


Chiba is an open source, web-based implementation of the XForms Candidate Recommendation that enables XForms to be used in current browsers without plugins or special requirements on the client-side.

Monday, March 31, 2003

The XML Apache Project has released version 2.4.0 of Xerces-J, the popular open source XML parser for Java. This release makes a couple of small changes to the Xerces native interface, adds support for the XML 1.1 candidate recommendation, updates the DOM3 Load and Save implementation to the latest working draft, and fixes a few bugs.


Oleg Tkachenko has released nxslt 1.1, a Windows command line utility for accessing the .Net XSLT engine. New features in this release include URI resolvers for xsl:include, xsl:import, and document() and support for multiple result documents. nxslt is written in C# and requires the .NET Framework version 1.0 to be installed.

Sunday, March 30, 2003

I've returned from Software Development 2003 West, where a good time was had by all. Agile programming was a particularly hot topic this year. Java and C++ were about the same. Interest in XML seems to be tapering off. I suspect most programmers have learned what they need to know about it. We now return you to your regularly scheduled programming.


Jochen Wiedmann's released JaxMe 1.5.6, "yet another open source Java/XML binding tool in the style of Castor or Zeus" that sits on top of SAX2. JaxMe provides code generators that read a W3C XML schema and generate code for parsing conformant XML documents into corresponding Java objects, saving those objects into a database or reading those Java objects from a database and converting them into XML. JaxMe supports SQL databases and Tamino. It includes an integrated application framework and a generator for EJB entity beans with bean managed persistence (BMP). It's based on a reduced subset of the W3C XML schema language that does not support choices, references, or recursion. This release introduces an alternative implementation of the manager factory with lazy instantiation. This is a bug fix release.


Jean-Marc Vanel has published a library of assorted XSLT transforms under the LGPL. These include identity and near identity transforms, source code generators, schema transforms, and more.


Andrew Thompson has released Unicode Font Info 1.5.1, an open source font inspection tool for Mac OS X 10.2 that supports Unicode 3.2 and enables users to navigate huge fonts with tens of thousands of glyphs.

Friday, March 14, 2003

I'll be travelling for the rest of the month, so updates will be a tad slow here until April. I leave tonight for the XML & Web Services 2003 show in London next week where I'll be talking about DOM Level 3, XSLT 2, XQuery, and other bleeding edge topics. While in London, I'll also be visiting the Extreme Tuesday Club and talking about XOM at a joint meeting of UKUUG and XML UK. Advance registration is required for both events. I think it's members only for the XMLUK/UKUUG meeting, but you were planning on joining one of those anyway, weren't you?

The following week I'll be eight time zones away in Santa Clara for Software Development 2003 West where I'll be discussing various topics related to XML and Java throughout the week, including reprising my Hands On XSLT course first given at SD East in Boston last year. I don't have any user group events planned for this trip, but Thursday night, March 27, I will be hosting a XOM Birds-of-a-Feather at the pool at the Westin Santa Clara. It's officially for show attendees only; but since I'm paying for the beer, I figure I can invite anyone I want. :-) Details will be posted on the xom-interest mailing list, or drop me a private e-mail if you want to come.


The Mozilla Project has released Mozilla 1.3 (and the day after I burnt all the CDs for Hands-On XSLT with Mozilla 1.2, damn it!). The popular open source web browser supports XML, CSS, XSLT, XHTML, HTML, DOM Level 1 and Level 2, and JavaScript. Java and Flash support are available thorugh plug-ins. Version 1.3 is available for Windows, Linux, Open VMS, AIX, Solaris, Irix, HP/UX, and Mac OS X. Mac OS 9 is not supported, which is disappointing, especially since Mozilla 1.2.1 was extremely unstable on Mac OS 9; and I was hoping for a release that would stop the frequent crashing in that environment. Version 1.3 adds junk-mail filtering, image auto-sizing, an API for rich text editing in webpages, newsgroup filters, dynamic profile switching, and many other small new features and bug fixes.


The Mozilla Project has also posted version 0.7 of Camino (nee Chimera), a small footprint, native Cocoa Mac OS X web browser based on Mozilla's Gecko layout engine that includes lots of XML support. Unlike Mozilla, this is only a browser: no e-mail client, news reader, chat program, or dog walker. Besides the name, which was changed for trademark reasons, changes in 0.7 include a new Download Manager with auto-download and auto-dispatch, compatibility with URL Manager Pro, a Page Text Encodings menu, Global History in the sidebar, new toolbar buttons, dragging of images and links to the desktop, Shockwave Directory support, Rendezvous support for local FTP and web servers, and proxy auto-config.


ActiveState has released Visual XSLT 1.7.9, an XSLT development environment plug-in for Visual Studio .NET 2003. Features include an XSLT Debugger, an XSLT Editor, XPath workshop, automatic output preview, and a template browser. Visual XSLT is $295 payware. Upgrades from 1.6 are free.


IBM has released Version 5.2 of XML for C++, a schema-validating XML parser based on Xerces-C 2.2. It's not clear what's changed since version 5.1 last month. This is probably just a bug fix release.


Norm Walsh has released version 1.78 of his DSSSL stylesheets for DocBook. DocBook is an XML application for technical documents. I used it to write Processing XML with Java (though I used the XSLT style sheets instead of the DSSSL ones). This release makes several small fixes and improvements.

Thursday, March 13, 2003

The Software Development 2003 West conference taking place in Santa Clara in two weeks (March 24 to March 28) is looking for a few more volunteers to man doors, check badges, collect evaluation forms and the like. In return for sitting in front of a door for a day (and listening to the sessions in that room), you get free admission for a day of attending anything you want at the conference.


Florent Tournois's XiMoL 0.7 is an open source data binding library for C++ based on the Standard Template Library (STL). Each object has its own reader/writer (operator<< and operator>>). Ximol is published under the LGPL.

Wednesday, March 12, 2003

soft4science has released the MathML .NET Control 1.0, a $249 payware WYSIWYG equation editor control for .NET that supports MathML presentation markup written in managed C#. This is for Windows only, naturally.

Tuesday, March 11, 2003

The Jakarta Apache Project has released JXPath 1.1, a class library that "applies XPath expressions to graphs of objects of all kinds: JavaBeans, Maps, Servlet contexts, DOM etc, including mixtures thereof." New features in version 1.1 include:

  • JDOM support
  • A getNode() method that returns the raw value without converting it to a String.
  • DynaBeans support.
  • The format-number() function from XSLT.

Hugues Cassé has released Elf 0.2, a port of XOM to Python. This is a bug fix release that adds support for the copy() method and XInclude.


Pekka Enberg's posted version 0.2.4 of XML Indent, an open source (GPL) "XML stream reformatter written in ANSI C" that "is analogous to GNU indent." This release improves newline handling.


Toni Uusitalo's posted Parsifal 0.6.78, a minimal, non-validating XML parser written in ANSI C. The API is based on SAX2. Parsifal is in the public domain. This is a bug fix release.


IBM's alphaWorks has released version 1.3 of the XML Wrapper Generator, a graphical tool that integrates XML data sources into a DB2 database. The tool loads XML schema files, "shreds" them to a relational schema, and generates appropriate NICKNAME and VIEW statements. This release fixes bugs and adds "better support for all kinds of XML Schemas."


Lord Pixel has released Unicode Font Info 1.5, an open source font inspection tool for Mac OS X that enables you to display Unicode fonts with up to tens of thousands of glyphs and copy characters to the clipboard.

Monday, March 10, 2003

The Apache Commons Team has released Digester 1.4.1, a SAX-based XML->object mapper, designed primarily for parsing XML configuration files though it has other uses too. Digester is configured through an XML -> Java object mapping module, which triggers actions whenever a pattern of nested XML elements is recognized.

Sunday, March 9, 2003

I've posted version 1.0d11 of my XIncluder library for SAX, DOM, and JDOM. There are two main new features in this release:

  • The SAXXIncluder supports catalog resolution using the Apache Commons Project's Resolver library (as well as other entity resolvers).
  • All three XIncluders now maintain the base URI information item of included documents by insertion of xml:base attributes where necessary. This requirement was added in the latest XInclude candidate recommendation.

The API is unchanged. However, my testing of this software is very informal and haphazard so there are doubtless bugs and probably some new ones. I really need a a good unit testing framework for programs that generate XML documents. XIncluder is published under the LGPL.

Saturday, March 8, 2003

The IETF has released three new RFCs covering internationalized domain names:

Eventually, this will enable domain names to contain characters like ο and Ι. You thought you already could use these characters in domain names? Actually, no you can't. You see those are the Greek letters omicron and capital iota, which, unless you look very closely, are pretty much the same as the ASCII letters o and l. (How similar they are depends on which fonts you're using. For some people, they won't even be close; but on many systems it's plenty good enough to fool someone; certainly good enough to get a casual user to enter their password into a form on paypaΙ.com) There are many more spoof buddies like this throughout Unicode. For instance, Chinese companies may well want to register both simplified and traditional forms of their names.

These security issues were raised repeatedly with the working groups by multiple people throughout the process, myself included; and solutions were proposed that allowed internationalized domain names while avoiding much of the damage these proposals cause; but the working group consistently refused to address them in any meaningful way. The most common response was that since it was already possible to spooh apple.com with app1e.com, it didn't matter that they were making the problem about a thousand times worse. I think I may register ιetf.org now, and see what happens. :-)

In any case, the onus now falls on the registrars not to register spoofed domain names (but given that app1e.com and MICROS0FT.COM are already registered, while the registrars are busily taking everyone's money while disclaiming responsibility for anything, I don't hold out much hope for that. Application software can attempt to watch out for such fraud, but this is going to affect every bit of software that displays or activates a URL. There's really no hope of avoiding the problem now. I predict that no later than 2004 there'll be some major frauds reported perpetrated through this vector.

Friday, March 7, 2003

SysOnyx Inc. has released xmlLinguist 1.0, a $79.95 payware Text-to-XML Translator that enables you to map and convert structured flat text files such as comma separated values to XML.

Thursday, March 6, 2003

Sun has submitted Java Specification request 206 (JSR-206) Java API for XML Processing (JAXP) 1.3 to the Java Community Process. The main goal of this release is to bring JAXP up to spec with XML 1.1, Namespaces 1.1, DOM Level 3, and SAX 2.0.1. It also suggests:

  • Standardizing some additional classes such as QName and NamespaceContext that are already being used by other specifications such as JAXB.
  • Protecting against the billion laughs attack
  • Adding an XPath query API for DOM objects
  • Supporting grammar pre-parsing and caching
  • XInclude support

Personally, I'd prefer the working group spend more time on more carefully specifying what the methods already in the javax.xml packages actually do before they worry about new features. A lot of the methods there are grossly underspecified. For instance, how does the identity transform behave when transforming a DOM Document object that has namespaced elements but no namespace declaration attributes? Does a SAXResult invoke startDocument() and endDocument() in its ContentHandler when the transform only produces a document fragment? Comments are due by March 17.


Sun has posted version 0.2.2 of xmlroff, an open source XSL Formatting Objects to PDF converter. xmlroff is written in C for Linux, and relies on the libxml2, libxslt, and the GLib, GObject and Pango libraries from GTK+ and GNOME (though neither GTK+ nor Gnome is required). It also needs PDFlib, FreeType2, and Fontconfig. xmlroff can be run from the command line. It also includes a libfo library. This release fixes a few bugs and improves documentation.


Morphon has released the Morphon XML Editor 3.0, a $150 payware XML editor that provides WYSIWYG, source, and tree views. Version 3.0 adds live spell checking, printing, print preview, and source code editing.

Wednesday, March 5, 2003

The RSS feed broke again this morning for a completely new reason. Apparently, I need to upload UTF-8 files in binary format rather than text or the non-ASCII characters get corrupted in transit. I've fixed it, and it should update by noon EST.


The W3C Web Services Description Group has published a new working draft of Web Services Description Language (WSDL) Version 1.2. Quoting from the introduction:

Web Services Description Language (WSDL) provides a model and an XML format for describing Web services. WSDL enables one to separate the description of the abstract functionality offered by a service from concrete details of a service description such as "how" and "where" that functionality is offered.

This specification defines a language for describing the abstract functionality of a service as well as a framework for describing the concrete details of a service description. The companion specification, WSDL Version 1.2: Bindings [WSDL 1.2 Bindings] defines a language for describing such concrete details for SOAP 1.2 [SOAP 1.2 Part 1: Messaging Framework], HTTP [IETF RFC 2616] and MIME [IETF RFC 2045].

WSDL describes Web services starting with the messages that are exchanged between the service provider and requestor. The messages themselves are described abstractly and then bound to a concrete network protocol and message format. A message consists of a collection of typed data items. An exchange of messages between the service provider and requestor are described as an operation. A collection of operations is called a port type. A service contains a collection of ports, where each port is an implementation of a portType, which includes all the concrete details needed to interact with the service.


Sun's released the Java Architecture for XML Binding 1.0 (JAXB) specification, reference implementation, API docs, and technology compatibility kit. JAXB compiles an XML schema into one or more Java classes. (First mistake: JAXB assume there's a schema. Second mistake: It assumes the schema is written in the W3C XML Schema Language.) JAXB can unmarshal schema-valid XML into Java objects; read, update and validate the Java objects against the schema, and write the result back out as XML. Unfortunately, this API is fundamentally flawed. The misconception that cripples it appears on the very first page of the spec:

An XML document need not follow any rules beyond the well-formedness criteria laid out in the XML 1.0 specification. To exchange documents in a meaningful way, however, requires that their structure and content be described and constrained so that the various parties involved will interpret them correctly and consistently. This can be accomplished through the use of a schema. A schema contains a set of rules that constrains the structure and content of a documents components, i.e., its elements, attributes, and text. A schema also describes, at least informally and often implicitly, the intended conceptual meaning of a documents components. A schema is, in other words, a specification of the syntax and semantics of a (potentially infinite) set of XML documents. A document is said to be valid with respect to a schema if, and only if, it satisfies the constraints specified in the schema.

Documents can in fact be exchanged meaningfully without schemas, and without correct and consistent interpretation. You may, after all, want to do something very different with a document than I want to do with it. You might want <month>January</month> to be a date and I might want it to be a string, a search key, or even a proper name. Your use and understanding of the data need not affect my use or unserstanding of the same data. Nor do documents need to be valid in order to be useful. For instance, the page you're reading now claims to be XHTML, but it's not valid according to the XHTML DTD. Did you notice? Do you care? JAXB's view of XML is far too limited. The mind set and preconceptions that led to JAXB were formed in the realm of compiled, binary systems that operate inside a single computer. The heterogenous, networked world of XML documents is far broader than that. JAXB is limited to the impoverished domain of single systems where everything agrees on what documents mean, how they're formed, and what should be done with them.


Sun has also released the Java Web Services Developer Pack 1.1 which includes the reference implementation for JAXB 1.0 as well as

  • Java API for XML Messaging (JAXM) v1.1.1
  • Java API for XML Processing (JAXP) v1.2.2 (with XML Schema support)
  • Java API for XML Registries (JAXR) v1.0_03
  • Java API for XML-based RPC (JAX-RPC) v1.0.3
  • SOAP with Attachments API for Java (SAAJ) v1.1.1
  • JavaServer Pages Standard Tag Library (JSTL) v1.0.3
  • Java WSDP Registry Server v1.0_04
  • Ant Build Tool 1.5.1
  • Apache Tomcat 4.1.2 dev container (plus fixes)
Tuesday, March 4, 2003

Andy Clark has posted an update for the CyberNeko Tools for XNI. This is a collection of XML tools written specifically to take advantage of the Xerces Native Interface API in Xerces2 including the NekoHTML parser and the NekoDTD parser. This release fixes some bugs in the NekoHTML parser.

Monday, March 3, 2003

Hmmm, looks like there's another bug in the RSS feed today. The 1.0.6 in the Gnosis Utils version number tripped the sentence end detection algorithm early. That shouldn't have happened. It only looks for a period followed by a space. I'm looking into this.

OK, I found the bug. I wasn't recognizing the colon as a sentence ender, so the algorithm fell back to the first period. It will be fixed with the next update at 10:00. It occurs to me that I should probably test my algorithm against all the news from the last year to see what else it's missing. However before I can do that, I'd have to turn all the old news into well-formed XHTML too, a non-trivial task. I'd love to use more XML tools on this site, but a lot of the systems in place predate XML by a couple of years so they use old, ugly regular expression and comment hacks and the like. I really should knuckle down and replace all that junk with clean XML based tools.


Frank McIngvale has released the Gnosis Utils 1.0.6, a public domain collection of Python modules for processing XML:

  • xml.pickle serializes objects to and from XML using an API compatible with the standard pickle module
  • xml.objectify turns arbitrary XML documents into Python objects
  • xml.validity checks validity against DTDs or schemas
  • xml.indexer provides full text indexing and searching of XML documents

Jochen Wiedmann's released JaxMe 1.5.5, "yet another open source Java/XML binding tool in the style of Castor or Zeus" that sits on top of SAX2. JaxMe provides code generators that read a W3C XML schema and generate code for parsing conformant XML documents into corresponding Java objects, saving those objects into a database or reading those Java objects from a database and converting them into XML. JaxMe supports SQL databases and Tamino. It includes an integrated application framework and a generator for EJB entity beans with bean managed persistence (BMP). It's based on a reduced subset of the W3C XML schema language that does not support choices, references, or recursion. This release introduces an alternative implementation of the manager factory with lazy instantiation.

Sunday, March 2, 2003

Emmanuil Batsis has posted an alpha of Sarissa, an open source (GPL) JavaScript library for processing XML under Mozilla and Internet Explorer. It provides methods to obtain DOM Document/XMLHTTP objects, synchronous and asynchronous loading, XSLT transformations, implements of some non-standard IE extensions for Mozilla, and adds NodeType constants for IE.


Toni Uusitalo's posted Parsifal 0.6.7, a minimal, non-validating XML parser written in ANSI C. The API is based on SAX2. Parsifal is in the public domain. This is a bug fix release.

Saturday, March 1, 2003

The W3C has posted a note on XML Advanced Electronic Signatures (XAdES). According to the abstact, XAdES

extends the IETF/W3CXML-Signature Syntax and Processing specification [XMLDSIG] into the domain of non-repudiation by defining XML formats for advanced electronic signatures that remain valid over long periods and are compliant with the European "Directive 1999/93/EC of the European Parliament and of the Council of 13 December 1999 on a Community framework for electronic signatures" [EU-DIR-ESIG] (also denoted as "the Directive" or the "European Directive" in the rest of the present document) and incorporate additional useful information in common uses cases. This includes evidence as to its validity even if the signer or verifying party later attempts to deny (repudiates) the validity of the signature.

An advanced electronic signature aligned with the present document can, in consequence, be used for arbitration in case of a dispute between the signer and verifier, which may occur at some later time, even years later.

Friday, February 28, 2003

The W3C DOM Working Group has updated the Document Object Model (DOM) Level 3 Core and Document Object Model (DOM) Level 3 Load and Save working drafts. Changes to the core include:

  • A new DOMStringList interface defines a read-only list of strings
  • A new NameList interface defines a read-only list of names and namespace URIs
  • A new DOMImplementationList interface defines a read-only list of DOMImplementations. There's also a new DOMImplementationSource interface and a getDOMImplementations method in the Document interface. Together with the DOMImplementationRegistry class these make it possible to load DOM implementations in an implementation independent fashion.
  • The getInterface method in Node has been renamed getFeature.
  • schemaType has been renamed schemaTypeInfo
  • TypeInfo.name has been changed to TypeInfo.typeName
  • TypeInfo.namespace has been changed to TypeInfo.namespaceURI
  • The Attr interface has an isId method.
  • The setIdAttribute, setIdAttributeNS, and setIdAttributeNode methods in Element can now be used for non-ID attributes. (Would somebody please explain to me what the point of this is? DOM just keeps getting more baroque with each release. By my count, the Element interface now has seven separate methods to add an attribute to an element, all of which do pretty much the same thing. XOM manages to get by with a paltry one method for adding attributes to elements. )
  • parameters: isId.
  • Parameters for DOMConfiguration are now objects instead of booleans

Changes in the Load and Save spec include:

  • The optional "ElementLS interface provides a convenient mechanism by which the children of an element can be serialized to a string, or replaced by the result of parsing a provided string."
  • Load and Save is now in sync with the latest Core draft, including the DOMConfiguration interface.
  • DOMBuilder has async and busy properties that specify whether the builder is asynchronous and currently loading an object, respectively

IBM's alphaWorks has updated their XML Registry, a data management system that provides services for XML artifacts including DTDs, schemas, stylesheets, and instance documents. Developers can use the registry to obtain an XML artifact automatically, search or browse for an XML artifact, deposit an XML artifact with or without related data, and register an XML artifact without deposit. Users can search for registered objects based on their metadata. New features in this release "include group operations (deleting a list of documents, creating versions, etc.); simplified searching; and registry entry properties (arbitrary document attribution)." This runs on AIX, Linux, Solaris, and Windows NT/2000

Thursday, February 27, 2003

Daniel Veillard's released version 2.5.4 of libxml2, the XML C library for Gnome and version 1.0.27 of libxslt, the GNOME XSLT library for C and C++. The new version of libxml2 fixes a number of bugs involving XInclude and RELAX NG support. The new version of libxslt fixes a couple of bugs in namespace handling and Windows path usage.


The W3C CSS working group has posted an updated last call working draft of CSS3 module: text. This document describes the basic text formatting properties for CSS3 including:

  • writing-mode
  • direction
  • glyph-orientation-vertical
  • glyph-orientation-horizontal
  • unicode-bidi
  • text-script
  • text-align
  • text-justify
  • text-align-last
  • min-font-size
  • max-font-size
  • text-justify-trim
  • text-kashida-space
  • text-indent
  • line-break
  • word-break-CJK
  • word-break-inside
  • word-break
  • wrap-option
  • linefeed-treatment
  • white-space-treatment
  • all-space-treatment
  • white-space
  • text-overflow-mode
  • text-overflow-ellipsis
  • text-overflow
  • letter-spacing
  • word-spacing
  • punctuation-trim
  • text-autospace
  • kerning-mode
  • kerning-pair-threshold
  • text-underline-style
  • text-line-through-style
  • text-overline-style
  • text-underline-color
  • text-line-through-color
  • text-overline-color
  • text-underline-mode
  • text-line-through-mode
  • text-overline-mode
  • text-underline-position
  • text-blink
  • text-underline
  • text-line-through
  • text-overline-mode
  • text-decoration
  • text-shadow
  • line-grid-mode
  • line-grid-progression
  • line-grid
  • text-transform
  • hanging-punctuation
  • text-combine

Many of these should be familiar from CSS2. The new ones mostly address the needs of East Asian and bidirectional text. Comments are due by March 5.


The W3C XML Schema Working Group has issued a number of new errata for the W3C XML Schema Recommendation.

Wednesday, February 26, 2003

Opera Software ASA has released version 7.0.2 of their namesake web browser for Windows to fix a few bugs. A Japanese verison is also available for the first time in the 7.x series.


The Jakarta Apache Project has posted the second beta of JXPath 1.1, a class library that "applies XPath expressions to graphs of objects of all kinds: JavaBeans, Maps, Servlet contexts, DOM etc, including mixtures thereof." New features in version 1.1 include:

  • JDOM support
  • A getNode() method that returns the raw value without converting it to a String.
  • DynaBeans support.
  • The format-number() function from XSLT.

Changes since the first beta are mostly bug and build fixes.


I've fixed the XSLT stylesheet that produces the RSS feed for this site so that it is no longer tripped up by middle initials. As is very often the case with complicated XSLT problems, the answer involved named templates, parameters, and recursion. Several people suggested regular expression based solutions, but that's way too hard to implement in XSLT 1.0. Right now I'm just checking to see if the character two characters before the period is a space; and, if it is, reinvoking the same template on the remainder of the first paragraph to extract the rest of the sentence.


The Unicode Consortium has posted a beta version of the Unicode 4.0 Character Database for public comment. Comments are due by March 21, and comments that get in by March 3 are much appreciated. Unicode 4.0 will add 1,226 new characters spread out over several dozen blocks. New character blocks in this release include:

  • Limbu
  • Tai Le
  • Khmer Symbols
  • Phonetic Extensions
  • Yijing Hexagram Symbols
  • Linear B
  • Aegean Numbers
  • Ugaritic
  • Shavian
  • Osmanya
  • Cypriot Syllabary
  • Tai Xuan Jing Symbols
  • Variation Selectors Supplement

In addition various characters have been added to existing blocks including Latin Extended B, IPA, Greek and Coptic, Arabic, Syriac, Devanagari, Bengali, Gurmukhi, Gujarati, Oriya, Tamil, Kannada, and Khmer. Note that this is a more traditional beta in which features (i.e. new characters) have been frozen for this release. The current review focuses on editorial changes and out and out mistakes.

Tuesday, February 25, 2003

The period after the middle initial in in the programmer's name in the next news item (and probably this one) seems to have confused my XSLT sentence end detection algorithm, so the RSS feed is slightly broken again today. I have to figure out how to tell the difference between a period that ends a sentence and a period that follows a middle initial. Maybe I can test whether there's at least two alphabetic characters before the period? Not too many English sentences end with a single letter word (though I suppose I can imagine something like "Kernighan and Ritchie today released a new version of C.").


Suren A. Chilingaryan has released some XML parser benchmark results. The test framework is open source. As with all benchmarks, take these with a bag or two of course road salt. Benchmarking is tough, and it's rarely done well or correctly. I saw at least two red flags in my initial perusal of the results. (Random test data and insufficient information provided about how timing and memory usage were measured.) The benchmark is under the GPL if you care to try it for yourself.


eSVG 1.5 is an implementation of the subsets of SVG 1.1 and SVG Mobile specifications designed for integration into embedded systems. eSVG project additionally provides multithreaded eSVG scripting according to SVG DOM 2 interface specification. eSVG scripting is based on SpiderMonkey (JavaScript-C) Engine and ORMIDE. Version 1.5 supports the most of SVG Tiny profile features, SVG Basic profile features, SVG DOM interface entries, and SMIL animation. eSVG currently runs on Windows 98/NT/2000/ME/XP, Windows CE, and UniOP MMI. eSVG costs $380 for 3 developer/50 runtime licenses.

Monday, February 24, 2003

The OpenOffice Project has released OpenOffice 1.0.2, a bug fix release of the open source office suite for Linux and Windows that saves all its files as zipped XML.


Bare Bones Software has released BBEdit 7.0.2. This is a free update for all 7.0 users. BBEdit is the Macintosh text/HTML/XML/programmer's editor I nomally use to write this page. Besides bug fixes, the big new feature in this release is much better auto-detection of character encodings. BBEDit now makes heroic efforts to determine what character set a document is written in using Macintosh metadata, byte order marks, XML declarations, and HTML meta headers rather than jsut assuming by default that all files are written with the platform's default encoding. This feature has finally convinced me to upgrade from 6.0 on my main work machine. Mac OS 9.1 or later is required.


Netscape Communications has released version 7.02 of their namesake web browser. This version is based on Mozilla 1.0.2 and includes updated Java and Flash plug-ins for Windows. As earlier version did, it also supports XML, HTML, XHTML, CSS, XSLT, RDF, DOM, and assorted other cool acronyms.


Toni Uusitalo's posted Parsifal 0.6.6, a minimal, non-validating XML parser written in ANSI C. The API is based on SAX2. Parsifal is in the public domain. This is a bug fix release.

Sunday, February 23, 2003

IBM's released the Informix XSLT DataBlade, a module that creates new SQL functions (a.k.a. user-defined routines or UDRs) that transform documents from one format to another using XSLT. It's based on Danial Veillard's libxslt, the Gnome C XSLT library.

Saturday, February 22, 2003

The cron job that generates the RSS feed for this site upchucked again this morning. Something about "xmlNanoHTTPConnectAttempt: Connect attempt timed out." which seems strange because all files are read from the file system. Oh wait, maybe it's the XHTML DTD from W3C it couldn't read. (I just figured that out as I was typing this). I should make a local copy and point to it instead. I figure that once this system runs without manual intervention for a couple of weeks, I'll add it to Cafe au Lait too, but right now I'm resetting the bug clock back to zero.


The W3C Document Object Model Working Group has posted an updated working draft of the Document Object Model (DOM) Level 3 Events Specification. According to the abstract,

This specification defines the Document Object Model Events Level 3, a generic platform- and language-neutral event system which allows registration of event handlers, describes event flow through a tree structure, and provides basic contextual information for each event. The Document Object Model Events Level 3 builds on the Document Object Model Events Level 2 [DOM Level 2 Events].

Changes since the last draft include splitting TextEvent into separate TextEvent and KeyboardEvent interfaces, and partially ordering event listeners.


The W3C Web Ontology Working Group has updated the Web Ontology Language (OWL) Reference Version 1.0 working draft. According to the abstract,

The Web Ontology Language OWL is a semantic markup language for publishing and sharing ontologies on the World Wide Web. OWL is developed as a vocabulary extension of RDF (the Resource Description Framework) and is derived from the DAML+OIL Web Ontology Language. This document contains a structured informal description of the full set of OWL language constructs and is meant to serve as a reference for OWL users who want to construct OWL ontologies.
Friday, February 21, 2003

Sun's posted the public review draft specification for Java Specification Request (JSR) 172, J2ME Web Services Specification, in the Java Community Process (JCP). This basically describes subset of JAXP, JAX-RPC, and XML intended for talking to SOAP services from Java 2 Micro Edition devices. Like a lot of web services specs, it makes a lot of mistakes when it comes to XML. For instance, it confuses DTDs with document type declarations, and validation with DTD processing. It effectively defines a subset of XML that significantly hobbles XML parsing in the J2ME space. Comments are due by March 22.

Thursday, February 20, 2003

Sun's posted xmlroff 0.2.0, an open source XSL Formatting Objects to PDF converter. xmlroff is written in C for Linux, and relies on the libxml2, libxslt, and the GLib, GObject and Pango libraries from GTK+ and GNOME (though neither GTK+ nor Gnome is required). It also needs PDFlib, FreeType2, and Fontconfig. xmlroff can be run from the command line. It also includes a libfo library.


The W3C Scalable Vector Graphics (SVG) Working Group has posted the first public draft of the SVG Printing Requirements. "This document lists the design principles and requirements for the creation of a SVG specification related to printing."


SKYRiX AG has released the SKYRiX Libraries for XML processing 4.2, an open source Objective C class library for processing XML. This contains:

  • An Objective C port of SAX2 for Objective-C
  • An Objective C wrapper for CoreFoundation XML
  • An Objective C wrapper for libxml2
  • An Objective C wrapper for libical
  • An Objective C wrapper for expat
  • An Objective C wrapper for plists
  • An Objective C wrapper for pyx
  • A DOM implementation for Objective C
  • An XML-RPC implementation

The libraries are published under the Lesser General Public License (LGPL)

Wednesday, February 19, 2003

The W3C Web Ontology Working Group has published the second public Working Draft of Web Ontology Language (OWL) Test Cases. "This document contains and presents test cases for the Web Ontology Language (OWL) approved by the Web Ontology Working Group. Many of the test cases illustrate the correct usage of the Web Ontology Language (OWL), and the formal meaning of its constructs. Other test cases illustrate the resolution of issues considered by the working group. Conformance for OWL documents and OWL document checkers is specified."

Tuesday, February 18, 2003

Toni Uusitalo's posted Parsifal 0.6.5, a minimal, non-validating XML parser written in ANSI C. The API is based on SAX2. Parsifal is in the public domain.


Edwin Dankert's released eXchaNGeR 1.0, an open source XML browser and editor framework, written in Java. eXchaNGeR provides both tree and source views of XML documents. It supports validation against DTDs and schemas, XPath queries, and plug-ins for specific XML applications such as SVG, XHTML, and SOAP. I've been very skeptical of most such efforts in the past, but this one looks like it might actually be useful if the performance is adequate. I'm going to have to download it and try it out. eXchaNGeR is published under the Mozilla Public License.

Update: I've tried it out. It's better than most, but still inadequate for real work. The editor only seems to work on valid documents. (Assuming validity is a very common error in XML software.) It can handle small documents, and handles large ones better than most other editors, but there are noticeable delays when loading files. eXchaNGeR is definitely not competitive speed wise with a basic text editor such as emacs, jEdit, or BBEdit. And there are lots of little glitches and user interface inconsistencies. For instance, the menus change depending on what is or is not open. Saving files failed due to various exceptions. There are some interesting ideas here, but eXchaNGeR needs a lot more work before it's ready for prime time.


Stefan Champailler's posted DTDDoc 0.0.5, a JavaDoc like tool for creating HTML documentation of document type definitions from embedded DTD comments. This release can display the parent of an element or attribute parent and can configure the title that is displayed on top of the index. DTDDoc is published under the GPL.

Monday, February 17, 2003

The W3C XQuery working group has published two new working drafts:


The W3C Cascading Style Sheets working group has updated the working draft of CSS3 module: Color. Defined properties include:

  • color
  • opacity
  • color-profile
  • rendering-intent
  • @color-profile

This release focuses on bringing CSS in sync with colors as used in SVG.

Sunday, February 16, 2003

IBM has released Version 5.1 of XML for C++, a schema-validating XML parser based on Xerces-C 2.3. It adds support for C++ namespaces (XML Namespaces have been supported for some time now), experimental XML 1.1 support, more parts of DOM Level 3 Core, and binaries for Linux/390.

Saturday, February 15, 2003

My derogatory comments about cookies continue to elicit howls of disbelief from developers who just don't see how they can create useful web applications without cookies. A lot of people think I'm talking about hiding the cookie in the URL, but that's not really what I mean at all, nor is it necessary. Until I have time to lay out the principles in more detail, let me just state one maxim that may suggest how you need to adjust your thinking to design session-free web applications:

State is a property of the resource, not the client. All necessary state is stored in the resource itself, never on the client.

I'm afraid I don't have time to elaborate on this right now since I'm way past deadline on my next book. (I started to, but it quickly became obvious this was a lot more than a quick paragraph for this web site.) But once that book is done, I do plan to publish some articles demonstrating exactly how to do what everyone is telling me is impossible. I'll also probably be talking about this at Software Development East in Boston in September. None of this is original research, though. It's all been done before, both in practice and theory. I have not personally invented any of this. In fact, this is the way HTTP was designed to work by the people who did invent it, as opposed to the engineers at Netscape who glued together a bunch of hacks like cookies, frames, and the font tag without ever really understanding HTTP or HTML. We're finally digging out from under the rubble left by Netscape's efforts to "improve" HTML. Now it's time to start repairing the damage they did to HTTP.


Michael Kay has released Saxon 7.4, a partial and experimental implementation of XSLT 2.0 written in Java. This release introduces strong typing, though the types have to be explicitly specified using xsi:type attributes. Saxon still doesn't perform schema validation before transforming. Version 7.4 also implements XPath 1.0 backwards compatibility mode for the first time. This is for experimenters only. Most users should continue to use Saxon 6.5.3.


Oleg Tkachenko has released nxslt, a Windows command line utility for accessing the .Net XSLT engine. nxslt is written in C# and requires the .NET Framework version 1.0 to be installed.

Friday, February 14, 2003

FTD.com has revealed that a hole in their web systems potentially allowed hackers to easily access customer information including names, addresses, credit card numbers, and everything else that would be needed to place fraudulent orders. They're blaming it on poor cookie design, but that's a misdiagnosis. The real problem is that they designed a stateful, session-based application on top of the fundamentally stateless HTTP protocol. This enabled hackers to masquerade as other sessions. That they did this with cookies is almost incidental. The same problem could have occurred with URL rewriting.

Had the FTD store been designed properly with a stateless, session-free, RESTful interface that relied on HTTP authentication--in other words one that worked the way the web was designed to work instead of using session hacks--this would never have happened. A lot of web developers are so accustomed to building sites the wrong way they don't even believe the right way can possibly work, but it can and it does. Cookies are never necessary. Anything useful that can be done with cookies can be done without them including user authentication and shopping carts. Every site that relies on cookies for proper operation is broken.


Jerome Alet has released Jaxml 3.0, "a Python module designed to ease the creation of human readable XML documents." JAXML is published under the GPL.

Thursday, February 13, 2003

The Mozilla Project has posted the first beta of Mozilla 1.3 (following an earlier alpha release). New features in this release include spam filtering and image autosizing, a really cool new idea that proves once again why competition in the browser space is important. Mozilla is available for the usual batch of platforms: Windows, Mac OS X, Linux, Solaris,, OpenVMS, OS/2, etc.


Apple's posted a new beta of their Safari web browser for Mac OS X. This release can display XML pages with CSS style sheets in the browser for the first time. XSLT does not appear to be supported yet. It also improves compatibility with some Web sites, speeds up page display, fixes bugs, and adds support for self-signed security certificates. Software Update isn't showing this release yet, but you can get it directly from Apple's web site.

Safari has an interesting new user interface innovation in this release. The location bar doubles as a progress bar for page loads. I haven't seen this much innovation in the browser space in years. The long, dark domination of Internet Explorer may be finally coming to an end. :-)


Pekka Enberg's posted version 0.2.2 of XML Indent, an open source (GPL) "XML stream reformatter written in ANSI C" that "is analogous to GNU indent." This release fixes bugs.


Stefan Champailler's posted DTDDoc 0.0.4, a JavaDoc like tool for creating HTML documentation of document type definitions from embedded DTD comments. This release fixes lots of bugs and improves the user interface. DTDDoc is published under the GPL.


The W3C Web Content Accessibility Guidelines Working Group has published the first public working draft of Requirements for WCAG 2.0 Checklists and Techniques. "It describes requirements for Checklists and Techniques described by the Web Content Accessibility Guidelines 2.0 (WCAG 2.0). These requirements are related to but different from Requirements for WCAG 2.0 in that 'Requirements for WCAG 2.0 Checklists and Techniques' specifies requirements for the technology-specific documents produced by the WCAG WG while 'Requirements for WCAG 2.0' specifies general requirements for the general usability of documents produced by the WCAG WG."

Wednesday, February 12, 2003

The W3C Web Ontology Working Group has updated two working drafts about the Web Ontology Language (OWL):

According to the Guide abstract,

The World Wide Web as it is currently constituted resembles a poorly mapped geography. Our insight into the documents and capabilities available are based on keyword searches, abetted by clever use of document connectivity and usage patterns. The sheer mass of this data is unmanageable without powerful tool support. In order to map this terrain more precisely, computational agents require machine-readable descriptions of the content and capabilities of web accessible resources. These descriptions must be in addition to the human-readable versions of that information.

The Web Ontology Language (OWL) is intended to provide a language that can be used to describe the classes and relations between them that are inherent in Web documents and applications.


The W3C Quality Assurance (QA) Activity has posted three last call working draft specifications on quality assurance:

These describe "a common framework for enhancing the quality practices of the W3C Working Groups in the areas of specification editing, production of test materials, and coordination efforts with internal and external groups." Comments are due by March 14.

Tuesday, February 11, 2003

I've set up a cron job to update the RSS Feed approximately hourly, during the usual hours when I work on this site. Another change necessitated by the use of RSS is that this page is now invalid XHTML. I needed to use the XHTML DTD to prevent the XSLT processor from choking on entity references like &eacute;, and that in turn required using the XHTML namespace. However, the page should still be served as plain HTML to most browsers.


I've posted version 1.0d10 of XOM, my new XML Object Model for Java. This release focuses on namespaces. First, it fixes all the known bugs in namespace handling. It also makes one API change. The declareNamespace method is once again addNamespaceDeclaration. Under the hood, however, there are much more significant changes in namespace handling, and these are likely to break some existing applications. XOM is published under the LGPL.


FOA (Formatting Object Authoring tool) 0.5 has been released. FOA is a Java application "that gives users a graphical interface to author XSL-FO stylesheets. With FOA you can generate pages, page sequences and fill them with content provided into one or more XML files. FOA will generate the XSLT stylesheet that transforms the XML content into an XSL-FO document." New features in 0.5.0 include bricks for block and inline containers, footnotes, floats, external (foreign) objects, and leaders as well as support for table captions, hyphenation, and background images.


IBM has updated their XML Parser for Java to version 4.1.4. This release is probably based on Xerces-J 2.3.0 (The Web page is unclear, but the timing of this release suggest that.) and supports the W3C XML Schema Recommendation 1.0, SAX 1.0 and 2.0, DOM Level 1, DOM Level 2, and some experimental features of DOM Level 3 Core and Load/Save Working Drafts, JAXP 1.2, and XNI.

Monday, February 10, 2003

Happy Fifth Birthday XML! Wow. Has it only been five years? It feels like longer, I guess because I got involved about a year before the final release. Still you've accomplished a lot more in five years than most acronyms do in 50. Here's to the next five years!

I think I need a birthday present for you. How about this? Cafe con Leche now has an RSS feed. A lot of readers have asked me for this; and I've resisted, mostly because I didn't want to make Cafe con Leche look exactly like every other RSS site on the planet. Stories here are divided by day, not story, and don't have titles. RSS really doesn't fit that structure very well; and I didn't want to waste extra time devising titles for each story just to fit RSS.

However, this weekend I had the brain flash that the first sentence of each story, while it might not be a classic title, was certainly good enough for stuffing in RSS. So I decided to whip up a quick XSLT stylesheet that would extract the first sentence of each story and put it into an RSS file.

I anticipated that finding the sentence boundaries would be tricky, and it was, but not too tricky. What really slowed me down was finding the story boundaries. Each story on Cafe con Leche is separated from the next by a horizontal rule. That is the structure looks something like this:

<today> story <hr/> story <hr/> story
<hr/> story </today>

Each story is marked up in HTML. View source if you want more details. About the first dozen things I tried ranging from simple to complex, failed to produce the desired results. This is one use-case where XSLT 2.0 would have helped a lot: << and >> operators, node-set intersections, templates creating node-sets rather than result tree fragments; all of these would have made the problem much easier. I could have used some extension functions, but I knew it had to be possible in XSLT 1.0 (It's Turing complete after all) and after a few hours I finally came up with a recursive template that did the trick by counting the number of hr elements that were found in a node's following-sibling axis.

There are still some things I need XSLT 2 or extension functions for, notably determining the base URI of the document being transformed and getting the current date and time. (Possibly I can hack the latter using the document() function if I can find or create an HTTP server that serves the current time in XML. Interestingly a SOAP service would not work here, but a RESTful one would.) If I had these two functions I could write one stylesheet that worked for both Cafe con Leche and Cafe au Lait. Right now though I think I'm going to debug and test this here on Cafe con Leche before adding one to Cafe au Lait. I just noticed the title of the page doesn't seem to be getting through. OK, I found my mistake there. A DTD or schema would have caught that earlier. Now I notice RSS 0.91 has some size limits I may have to work around. Hmm, it seems like RSS 0.92 eliminates them so upgrading the version number fixes that problem. Done.

I still need to figure out how I'm going to automate this. The simplest solution is probably just using a cron job to invoke the stylesheet. If you look at this site with RSS right now, you're actually seeing yesterday's news.

Sunday, February 9, 2003

Pekka Enberg's posted version 0.2.1 of XML Indent, an open source (GPL) "XML stream reformatter written in ANSI C" that "is analogous to GNU indent." This release adds line wrapping support and fixes bugs.


Topologi has released the Collaborative Markup Editor 1.1.1 (CME), a $AUS105 payware source level, non-tree XML editor. CME is written in Java and runs on Mac OS X, Windows, and Linux. New features in this release include:

  • Search and replace
  • Comprehensive tools for whitespace handling, line wrapping and character encoding
  • RELAX NG compact syntax
  • Examplotron
  • RELAX NG compact syntax and Examplotron
  • Peer-to-peer support allows users to annotate and swap screenshots
  • Read from and export to ZIP files
  • RTF style-mapping and import
  • Java class mini-browser and API documentation

Version 1.6.1 of Axkit, the Perl-based XML Application Server Framework for Apache, has been released. AxKit converts XML to other formats such as HTML, WAP and text on the fly using either W3C standard techniques like XSLT and XInclude or custom code. This release fixes bugs adds a few small features including a new AxExternalEncoding option that allows you to have filesystems and other external resources that aren't stored as UTF-8 and a finalized AxHandleDirs option that allows Apache's directory handlers to return XML representing the directory to AxKit for further processing.

Saturday, February 8, 2003

The W3C Multimodal Interaction Working Group has published a note on the Requirements for the Ink Markup Language. According to the abstract,

The Ink Markup Language will serve as the data format for representing ink entered with an electronic pen or stylus in a multimodal system. The markup will allow for the input and processing of handwriting, gestures, sketches, music and other notational languages in web-based multimodal applications. In the context of the W3C Multimodal Interaction Framework, the markup provides a common format for the exchange of ink data between components such as handwriting and gesture recognizers, signature verifiers, and other ink-aware modules.

The W3C HTML Activity has published the candidate recommendation of XML Events, a module that "provides XML languages with the ability to uniformly integrate event listeners and associated event handlers with Document Object Model (DOM) Level 2 event interfaces" in order to associate behaviors with elements. The Working Group has published a diff- marked version of this spec, but if there've been any changes since the last working draft, I didn't see them.


Friday, February 7, 2003

R.V.Guha of IBM and Patrick Hayes of the University of West Florida have published a W3C note on LBase: Semantics for Languages of the Semantic Web. According to the abstract,

This document presents a framework for specifying the semantics for the languages of the Semantic Web. Some of these languages (notably RDF [RDF-PRIMER] [RDF-VOCABULARY] [RDF-SYNTAX] [RDF-CONCEPTS] [RDF-SEMANTICS], and OWL [OWL]) are currently in various stages of development and we expect others to be developed in the future. This framework is intended to provide a framework for specifying the semantics of all of these languages in a uniform and coherent way. The strategy is to translate the various languages into a common 'base' language thereby providing them with a single coherent model theory.

We describe a mechanism for providing a precise semantics for the Semantic Web Languages (referred to as SWELs from now on. The purpose of this is to define clearly the consequences and allowed inferences from constructs in these languages.

Hmmm, let me see if I understand this. XML was supposed to provide semantics to the Web, but it wasn't enough so RDF was invented. But it turned out that RDF wasn't enough either, so OWL was invented. Then it turned out that OWL wasn't enough so now we have SWELs. I'm beginning to understand Clay Shirky's point that it's turtles all the way up.


The XML Apache Project has released Xerces C++ 2.2.0, a schema-validating XML parser written in fairly portable C++. New features in this release include C++ Namespace Support, an experimental implementation of XML 1.1, more DOM Level 3 Core support, and various other small improvements and bug fixes.


The W3C has released Amaya 7.2, an open source web browser and authoring tool for Solaris, Linux, and Windows that supports HTML, XHTML, XML, CSS, MathML, and SVG. This is mostly a bug fix release.

Thursday, February 6, 2003

The W3C DOM Working Group has posted a new last call working draft of DOM Level 3 Validation that addresses various issues raised during the last call period.


Daniel Veillard's released version 2.5.2 of libxml2, the XML C library for Gnome and version 1.0.25 of libxslt, the GNOME XSLT library for C and C++. The new version of libxml2 adds preliminary support for RELAX NG, as well as fixing a few bugs. The new version of libxslt fixes a few bugs.


Wednesday, February 5, 2003

The W3C Quality Assurance Team has posted two useful notes on Common User Agent Problems and Common HTTP Implementation Problems. The former should required reading for anyone attempting to write a browser. In particular, the IE team at Microsoft really needs to read this, though they're hardly the only group out there that can benefit.

The second note should be read by anyone attempting to write a web server of any kind, whether a traditional web server like Apache, or some sort of application server like WebObjects. In particular, the section on URIs is required reading for Apple's web team, the WebObjects developers, and all the subscribers to the webobjects-talk mailing list who just noticed that I called their favorite software a piece of animal feces back in October (Actually, I was more polite than that; but I shouldn't have been.) and who persist in bombarding me with e-mails that only prove that they really don't understand HTTP. Here are a few clues for the clueless:

  • All URLs should be able to be bookmarked and linked to.
  • HTTP is stateless by design.
  • The difference between GET and POST has nothing to do with how many characters broken browsers can stuff into a URL.
  • Cookies are unnecessary for user authentication, shopping carts, or anything else.

Yes, more developers and more software operate the wrong way than the right way. This includes WebObjects, though it's hardly alone here. There are many other broken sites and broken software that behave similarly. Good library software is implemented by experts who really know their field. The software does the right thing so the developers who use it don't have to be experts in the field. WebObjects is not good software. It was obviously designed by programmers who were not HTTP experts. They may have known a lot of things about GUI design, databases, programmer productivity, and the like; but they didn't know HTTP. Consequently, they made a lot of novice mistakes that are now propagated by almost every web designer using WebObjects. WebObjects makes it easy to do the wrong thing.

If you want to argue that the Web architecture is a bad architecture, that HTTP should allow state, that unbookmarkable URLs are a good idea, and so forth, feel free; but first you have to understand what you're criticizing. Do the necessary homework to understand what the web architecture you're arguing against is (and of course whether you really want to be arguing against it in the first place). Otherwise you end looking as foolish to people who do understand the web architecture as the early 20th century etiquette books that suggested you write a letter to people you wanted to call on the phone to find out when would be the appropriate time to call look to someone who actually understands the phone system. Here are a few places you might want to start:

  1. Web Architecture from 50,000 feet (www.w3.org)
  2. Second Generation Web Services [Feb. 06, 2002] (www.xml.com)
  3. REST and the Real World [Feb. 20, 2002] (www.xml.com)
  4. Google's Gaffe [Apr. 24, 2002] (www.xml.com)
  5. FrontPage - RESTwiki (internet.conveyor.com)
  6. Architectural Styles and the Design of Network - based Software Architectures (www.ics.uci.edu)

Opera Software ASA has released version 7.0.1 of their namesake web browser for Windows to fix a few security problems found in the previous release. All 7.0 users should upgrade. Interestingly, I was flipping through the channels on my TV last night when I noticed this scrolling ticker on one of the news stations: "Non-Microsoft Security Hole Found." What's next? Man bites dog?


Peter J. Jones has posted xmlwrapp 0.4.0, a C++ library for working with XML built on top of Daniel Veillard's libxml2. This release adds support for XSLT. xmlwrapp is published under a BSD license.

Tuesday, February 4, 2003

Hugues Cassé has released Elf 0.1, a port of XOM to Python, somewhat to my surprise.


The W3C Web Ontology Working Group has updated the working drafts of Web Ontology Language (OWL) Abstract Syntax and Semantics.


IDEALX has released DocBook2LaTeX 1.0, a program that translates DocBook documents into LaTeX. DocBook2LaTeX is written in Perl, and depends on AxKit's XML::XPathScript. It supports tables, indices, figures, footnotes, and more.


The Gnome project has posted librsvg 2.2.2.1, an SVG rendering library written in ANSI C for Linux and Unix. It's not complete yet, but it does a a lot. librsvg is published under the LGPL.


Eric van der Vlist has posted version 0.7 of Examplotron, a schema language that uses a single attribute to describe valid content. Version 0.7 is desinged as "two tiers: the 'iconic 80%' which is based on the original idea that annotated instance documents can be considered as highly intuive schemas and the 'complemental 20%' which will be introduced later on by 'importing' the Relax NG patterns in the Examplotron namespace."


Pekka Enberg's posted version 0.2.0 of XML Indent, an open source (GPL) "XML stream reformatter written in ANSI C" that "is analogous to GNU indent." This release has been revised to use Flex, and now allows you to force a line break before and after start- and end-tags.

Monday, February 3, 2003

The W3C Resource Description Framework (RDF) Core Working Group has published six last call working drafts:

RDF Primer
According to the abstract, RDF "is particularly intended for representing metadata about Web resources, such as the title, author, and modification date of a Web page, copyright and licensing information about a Web document, or the availability schedule for some shared resource. However, by generalizing the concept of a 'Web resource', RDF can also be used to represent information about things that can be identified on the Web, even when they can't be directly retrieved on the Web. RDF provides a common framework for expressing this information so it can be exchanged between applications without loss of meaning. This Primer is designed to provide the reader with the basic knowledge required to effectively use RDF. It introduces the basic concepts of RDF and describes its XML syntax. It describes how to define RDF vocabularies using the RDF Vocabulary Description Language, and gives an overview of some deployed RDF applications. It also describes the content and purpose of other RDF specification documents."
Resource Description Framework (RDF): Concepts and Abstract Syntax
"This document defines an abstract syntax on which RDF is based, and which serves to link its concrete syntax to its formal semantics. It also includes discussion of design goals, meaning of RDF documents, key concepts, datatyping, character normalization and handling of URI references."
RDF Semantics
"This is a specification of a precise semantics for RDF and RDFS, and of corresponding entailment and inference rules which are sanctioned by the semantics."
RDF Vocabulary Description Language 1.0: RDF Schema
"This specification describes how to use RDF to describe RDF vocabularies. This specification defines a vocabulary for this purpose and defines other built-in RDF vocabulary initially specified in the RDF Model and Syntax Specification."
RDF/XML Syntax Specification (Revised)
"This document defines an XML syntax for RDF called RDF/XML in terms of XML Namespaces, the XML Information Set and XML Base. The formal grammar for the syntax is annotated with actions generating triples of the RDF Graph as defined in RDF Concepts and Abstract Syntax. This is done using the N-Triples RDF Graph serializing format which enables more precise recording of the mapping in a machine processable form. The mappings are recorded as tests cases, gathered and published in RDF Test Cases."
RDF Test Cases
This document describes a set of machine-processable test cases for RDF though it does not contain the test cases themselves which are available separately.

Comments on all six are due by February 21.

Sunday, February 2, 2003

The W3C Web Services Description Working Group has published updated working drafts of the Web Services Description Language (WSDL) Version 1.2 and the Web Services Description Language (WSDL) Version 1.2: Bindings. According to the binding draft,

The Web Services Description Language WSDL Version 1.2 (WSDL) [WSDL 1.2] defines an XML grammar [XML 1.0] for describing network services as collections of communication endpoints capable of exchanging messages. WSDL service definitions provide documentation for distributed systems and serve as a recipe for automating the details involved in applications communication. WSDL 1.2 Bindings (this document) defines binding extensions for the following protocols and message formats:
  • SOAP Version 1.2 [SOAP 1.2 Part 1: Messaging Framework] (see 2. SOAP Binding).
  • HTTP/1.1 GET/POST [IETF RFC 2616] (see 3. HTTP GET and POST Binding).
  • MIME [IETF RFC 2045] (see 4. MIME Binding).

Antenna House has released version 2.4 of their XSL Formatter, a Windows program that can display XSL-FO documents on-screen in a GUI or create PDF files for an extra charge. Version 2.4 adds support for Type1 fonts with PDF output and JPEG images with CMYK color and fixes various bugs. Pricing starts around $1000 and climbs rapidly from there depending on platform and options. This is all way overpriced to start with, but what's particularly galling is that the Windows server version costs $1498 more than the Linux server version and the Solaris server version costs $2000 more than the Linux version.

Saturday, February 1, 2003

The W3C XHTML Working Group has published a new working draft of XHTML 2.0. XHTML 2.0 is the next, backwards incompatible version of HTML that incorporates XFrames, XForms, and lots of other crunchy XML goodness. However, XLink is not yet included and may never be. (The HTML Working Group are extreme XLink skeptics.) It is not obvious to me what changed in this draft.


The W3C Voice Browser Working Group has published the candidate recommendation of Voice Extensible Markup Language (VoiceXML) Version 2.0. VoiceXML is an XML application "designed for creating audio dialogs that feature synthesized speech, digitized audio, recognition of spoken and DTMF key input, recording of spoken input, telephony, and mixed initiative conversations. Its major goal is to bring the advantages of web-based development and content delivery to interactive voice response applications."


IBM's alphaWorks and Opera ASA have released the Multimodal Browser and Toolkit, a development environment and runtime for multimodal applications written to the W3C XHTML+Voice (X+V) note.


alphaWorks has also updated the Multimodal Browser Extension to version 3.1. This is an Internet Explorer plug-in that renders multimodal applications written according to the W3C XHTML+Voice (X+V) note. "This technology, which includes IBM's automatic speech recognition and text-to-speech engines, allows testing of voice-enabled Web applications written in the X+V language." Windows 2000 or XP is required.

Friday, January 31, 2003

www.elharo.com is back up today. The static web site is running. I still have to restore the web services on the box. This time I'm going to try to use the real Tomcat instead of Sun's hacked version for the Cobalt Qube to see if I can get the URLs I published in Processing XML with Java to work again.


The Apache XML Project has published the first public version of the Web Services Invocation Framework (WSIF) 2.0. WSIF is a Java API for invoking remote services described by the Web Services Description Language. WSIF developers interact with Web Services at the abstract level through their WSDL descriptions. This is done independently of APIs specific to a message format or network protocol such as SOAP. According to the announcement,

With WSIF, developers work with the same programming model regardless of how the Web service is implemented and accessed. WSIF achieves this with a pluggable architecture with protocol-specific "providers" to handle invocations according to a specific protocol.

Apache WSIF 2.0 comes bundled with providers for SOAP (using Apache SOAP or Axis), local java classes, EJBs, JMS services and applications accessible via Java Connectors. WSIF also describes the specific WSDL extensions used to make these kinds of applications accessible as WSDL-described services.

WSIF allows stubless or completely dynamic invocation of a Web service, based upon examination of the meta-data about the service at runtime. It also allows updated implementations of a binding to be plugged into WSIF at runtime, and it allows the calling service to defer choosing a binding until runtime.


It hit me suddenly today that a lot of different communities intensely dislike web services for similar, but totally different reasons:

  • The HTTP community detests SOAP/XML-RPC/Web Services because they violate the fundamental design of HTTP.
  • The XML community detests SOAP/XML-RPC because they violate the fundamental design of XML.
  • The security community detests SOAP/XML-RPC/Web Services because they violate fundamental network security principles.

It's interesting that the Web Services community has managed to alienate three different communities for three different reasons that all derive from not understanding or accepting the basic principles of the technologies they're building on. They're either geniuses or idiots. My money's on idiots, but time will tell.

Thursday, January 30, 2003

Ernst de Haan's posted xmlenc 0.27, an open source library for streaming XML output. It's marginally more convenient than System.out.println(). However, it does not guarantee well-formedness of the output, which to my mind is a sine qa non for any XML output library.


IBM's alphaWorks has updated their XML Security Suite to support the final recommendations of XML Encryption and Decryption Transforms. This release also fixes a few bugs and improves performance of exclusive XML canonicalization.


The Jakarta Apache Project has posted the first beta of JXPath 1.1, a class library that "applies XPath expressions to graphs of objects of all kinds: JavaBeans, Maps, Servlet contexts, DOM etc, including mixtures thereof." New features in version 1.1 include:

  • JDOM support
  • A getNode() method that returns the raw value without converting it to a String.
  • DynaBeans support.
  • The format-number() function from XSLT.
Wednesday, January 29, 2003

www.elharo.com is down due to damage caused by a brief power outage in my apartment yesterday (compounded by an incredibly poor user interface for shutting down the server). I'll probably have to rebuild its hard drive later today.


Wattle Software has released of XMLwriter 2.0, a $79 payware XML editor for Windows platform. New features in this release include:

  • Intelligent entry helpers
  • Customizable code snippets
  • Auto-completion of XML tag pairs
  • W3C XML Schema support

The W3C SVG working group has posted the first draft of the SVG Roadmap that lists estimated dates for future SVG developments. They've also publicly revealed their charter for the first time.


Altova has released a new version of XMLSpy 5.0, a $990 payware XML editor for Windows. (If you want support, it will cost you $198 more.) Annoyingly, they don't seem to have bothered to change the version number. New features in this release include C# code generation, support for the Oracle XML DB, a graphical XSL:FO designer, and a WSDL documentation generator.


The W3C Cascading Style Sheets Working Group has posted a new working draft of Cascading Style Sheets, level 2 revision 1, that is, CSS 2.1. Unusually for a new spec, this goes backwards from the previous version. It focuses on removing properties from CSS2 rather than adding them. The impetus for removal is the failure of browser vendors to implement them. Features removed include:

  • font-stretch
  • font-size-adjust
  • Aural style sheets
  • text-shadow

In addition, CSS 2.1 adds support for media-specific style sheets, content positioning, table layout, internationalization and some properties related to user interface. It also "corrects a few errors in CSS2 (the most important being a new definition of the height/width of absolutely positioned elements, more influence for HTML's 'style' attribute and a new calculation of the "clip" property)."

This release restores the previously removed support for named counters and page margins.


Eric van der Vlist has posted version 0.6 of Examplotron, a schema language that uses a single attribute to describe valid content. Version 0.6 adds an empty pattern to empty elements.


Andy Clark has released an update for the CyberNeko Tools for XNI that works with the recently released Xerces-J 2.3.0. This is a collection of XML tools written specifically to take advantage of the Xerces Native Interface API in Xerces2 including the NekoHTML parser and the NekoDTD parser.

Tuesday, January 28, 2003

I've returned from Munich. I only had one opportunity to check my e-mail while I was away, so my Inbox has gone from a svelte 485 messages to a bloated 1304, (and that's after spam filters have been applied) but I hope to catch up today. I suppose I could have checked my e-mail more often, but I preferred to spend my free time sightseeing, eating sausages, and drinking beer. :-). I have posted the notes from all three talks I gave at the OOP 2003 conference while I was there:


Opera Software has released version 7 of their namesake web browser for Windows that supports XML, XHTML, and CSS. New features in version 7.0 include "Fast Forward", a one-click log-in password manager, "Spatial Navigation" a links panel that displays all links in the current page, one-click skin install; powerful new panel management; and multiple user style sheets. Opera is $39 payware/free-beer adware.


James Clark has released a new version of Trang, an open source Java tool that translates schemas written in RELAX NG into different formats. In particular, it can

  • translate a RELAX NG schema in the compact syntax into the XML syntax
  • translate a RELAX NG schema in the XML syntax into the compact syntax
  • translate a RELAX NG schema in either the XML or compact syntax into a DTD
  • translate a RELAX NG schema in either the XML or compact syntax into a W3C XML Schema

The latest release adds support for translating DTDs into W3C XML Language schemas. It can even convert parameter entities into the higher-level semantic constructs available in XSD such as simple types, groups, and attribute groups.


Eric van der Vlist has posted version 0.5 of Examplotron, a schema language that uses a single attribute to describe valid content. Version 0.5 keeps roughly the same features as 0.4 but is built on a totally different architecture. Instead of being compiled into a XSLT transformation, Examplotron schemas are now compiled into Relax NG schemas (with embedded Schematron rules when needed).


The XML Apache Project has released version 2.3.0 of the open source Xerces-J XML parser for Java. As of version 2.3.0, the Xerces Native Interface (XNI) core and parsers packages are declared finished. You should be able to rely on these APIs going forward. This release also brings Xerces-J into compliance with the latest working drafts of DOM Level 3 Core and Load/Save, though the relevant classes are still in the org.apache.xerces.dom3 package instead of org.w3c.dom. It introduces many fixes to bring Xerces-J's behaviour into line with XML Schema errata. This release adds experimental support for XML 1.1. Finally, Xerces-J can be configured to reject documents that implement the billion laughs attack (though it's not clear exactly how this is accomplished. Perhaps by just rejecting all documents that contain a document type declaration? That seems unnecessarily harsh to me.)

Thursday, January 23, 2003

I've arrived in Munich. The beer is good, though not as wonderful as I'd been told. The food, however, is fantastic, especially the sausages. I'm in the OOP 2003 speaker's lounge right now. They've got quite good Internet access here. Unfortunately I've accidentally left my quotes file at home (I'm getting quite tired of file transfer programs that silently die when they fail to copy everything you've asked them to copy. I especially don't see why a program should give up on files J-Z just because it couldn't copy file I.) so unless someone here says something memorable in English (most of the talks are in German) the quote probably won't get updated until next week. :-(

However, I have posted the notes for today's talks:

Monday, January 20, 2003

I'm going to Munich this week for the OOP 2003 conference. I'm not sure what sort of Internet access I'll have while I'm there. In any case, updates are likely to be a little slow until I return.


I've posted version 1.0d9 of XOM, my open source, tree-based API for processing XML with Java. This release cleans up a few asymmetries in the API and fixes a couple of bugs. There are no major new features in this release. XOM is published under the LGPL.


Once again, I'll be chairing the XML track for Software Development 2003 East in Boston this September. The Call for Papers for is now live. Submissions need to be in by Valentine's Day (February 14).

For the XML track, we're interested in practical sessions covering all aspects of XML. This is not specifically an XML show, so we tend to find that our audience responds better to more practical, how-to, basic sessions as opposed to more theoretical, high-level sessions. For instance, a simple introduction to XQuery would go over better than a detailed comparison of XQuery optimization techniques. One thing previous attendees have told us is that they'd like to see more new sessions at each show, so we're going to be looking preferentially for talks that have not previously been given at SD East.


The Mozilla project has released version 1.0.2 of the Mozilla web browser, a bug fix release on the 1.0 trunk. According to the release notes, "Mozilla 1.0.2 contains stability and security improvements. 1.0.2 also has fixes for standards support, UI correctness and polish, performance, and site compatibility." Mozilla 1.0.2 runs under Mac OS 8.5 and later, Windows 95 and later, Linux, OpenVMS 7.1 and later, and Solaris 8.

Sunday, January 19, 2003

The W3C Multimodal Interaction Working Group has published a note on Requirements for EMMA. EMMA is the "Extensible MultiModal Annotation language." According to the abstract, "EMMA is intended as a data format for the interface between input processors and interaction management systems. It will define the means for recognizers to annotate application specific data with information such as confidence scores, time stamps, input mode (e.g. key strokes, speech or pen), alternative recognition hypotheses, and partial recognition results, etc. EMMA is a target data format for the semantic interpretation specification being developed in the Voice Browser Activity, and which describes annotations to speech grammars for extracting application specific data as a result of speech recognition. EMMA supercedes earlier work on the natural language semantics markup language in the Voice Browser Activity."

Saturday, January 18, 2003

Simon St. Laurent has founded the xml-hypertext mailing list, "an open forum for the discussion of creating hypertext with XML. Appropriate subjects include technologies for linking and pointing, hypertext-oriented transformations, and interactions between XML and Web infrastructure."

Friday, January 17, 2003

James Clark has released a new version of Trang , an open source Java tool that translates schemas written in RELAX NG into different formats. In particular, it can

  • translate a RELAX NG schema in the compact syntax into the XML syntax
  • translate a RELAX NG schema in the XML syntax into the compact syntax
  • translate a RELAX NG schema in either the XML or compact syntax into a DTD
  • translate a RELAX NG schema in either the XML or compact syntax into a W3C XML Schema

Norm Walsh has posted version 2.0.8 of DocBook: The Definitive Guide. According to Walsh, this is a work in progress that "purports to document DocBook V4.2 with the EBNF, HTML Forms, MathML, and SVG modules. As it is being actively updated, it may be inconsistent in some areas."

Thursday, January 16, 2003

Simon St. Laurent's posted a Skunkworks out-of-line linking proposal called VELLUM (Very Extensible Linking Language Unafraid of Markup).


Wolfgang Meier of the Darmstadt University of Technology has posted version 0.9 of eXist, an open source native XML database that supports fulltext search. XML can be stored in either the internal, native XML-DB or an external relational database. The search engine has been designed to provide fast XPath queries, using indexes for all element, text and attribute nodes. The server is accessible through HTTP and XML-RPC interfaces and supports the XML:DB API for Java programming. Version 0.9 focuses on performance and scalability improvements. eXist is published under the LGPL.

Wednesday, January 15, 2003

The World Wide Web Consortium (W3C) has issued the final recommendations of Scalable Vector Graphics (SVG) 1.1 and Mobile SVG Profiles: SVG Tiny and SVG Basic. SVG is an XML format for line art. "SVG 1.1 separates the SVG language into reusable building blocks. Mobile SVG re-combines them optimized for cellphones and pocket computers."


Daniel Veillard's released version 1.0.24 of libxslt, the GNOME XSLT library for C and C++. The new version of libxslt fixes bugs and cleans up the code and docs. The new version of libxml fixes a few bugs and adds fragment identifier support to the document() function and EXSLT URI escaping.


Version 0.8.0 of TM4J, an open source Java toolkit for parsing, manipulating and exporting topic map data, has been released. "TM4J supports import of XTM and LTM topic map interchange formats; a complete data model with a variety of persistence mechanisms; an implementation of the tolog topic map query language; and a collection of command-line tools and programming utility classes to make it easy to work with topic map data in an application development environment. This latest stable release of TM4J incorporates a new persistent storage model which supports a variety of relational backends."

Tuesday, January 14, 2003

Peter Flynn's published version 3.0.1 of his XML FAQ. This release "Added information on Office Applications including Corel, Microsoft, and Sun (to keep alphabetical order :-); updated details of conferences and training; updated browser details; reworded a few ungainly sentences; removed some obsolete URLs (mostly for nice idea sites which died); changed the phrasing of the question on databases; added details on how to do standalone validation to the question on parsing (thanks to Bill Rayer); added question on how to present XML to management (thanks to Tad McClellan); the questions on APIs and the DOM have been subsumed into the question on software, which has been extensively rewritten; added yet more explanation to the section on Unicode; 3.01 fixes minor typos."


Norm Walsh has released version 1.59.1 of his XSLT stylesheets for DocBook. This release features numerous small enhancements and bug fixes. However, several people have reported that it may have problems running with the latest version of FOP.


A new working draft of Streaming Transformations for XML (STX) has been posted. "Streaming Transformations for XML (STX) is a one-pass transformation language for XML documents that builds on the Simple API for XML (SAX). STX is intended as a high-speed, low memory consumption alternative to XSLT. Since it does not require the construction of an in-memory tree, it is suitable for use in resource constrained scenarios."


Berin Lautenbach's posted a beta of XML-Security-C, a partial open source C++ implementation of XML Digital Signatures.

Monday, January 13, 2003

The W3C Multimodal Interaction Working Group has published a note on the Multimodal Interaction Requirements. This group is trying to figure out how different inputs such as speech, handwriting, keyboards, and so forth can be connected up with different outputs such as audio, video, and screens within the same sytem. The goal is to allow content and processing to be decoupled from the specific input and output methods. "The requirements cover general issues, inputs, outputs, architecture, integration, synchronization points, runtimes and deployments, but this document does not address application or deployment conformance rules."

Saturday, January 11, 2003

RO IT Systems has released version 2.26 of the Perl SVG module to CPAN and its mirrors. This release is now compatible with Perl versions up to 5.8 and thread safe.


Ernst de Haan's posted xmlenc 0.22, an open source library for streaming XML output. It's marginally more convenient than out.println(). However, it does not guarantee well-formedness of the output, which to my mind is a sine qa non for any XML output.


Apple's posted a new beta of their Safari web browser. This is a bug fix release.

Friday, January 10, 2003

The W3C DOM Working Group has released the final recommendation of Document Object Model (DOM) Level 2 HTML Specification. The abstract states, "This specification defines the Document Object Model Level 2 HTML, a platform- and language-neutral interface that allows programs and scripts to dynamically access and update the content and structure of [HTML 4.01] and [XHTML 1.0] documents. The Document Object Model Level 2 HTML builds on the Document Object Model Level 2 Core [DOM Level 2 Core] and is not backward compatible with DOM Level 1 HTML [DOM Level 1]."


The W3C XML Schema Working Group has posted the first public working draft of XML Schema: Component Designators. This spec "defines a scheme for identifying the XML Schema components specified by XML Schema Part 1: Structures and XML Schema Part 2: Datatypes." Schema components that need to be identified include:

  • Simple and complex type definitions
  • Attribute declarations
  • Element declarations
  • Attribute and model group definitions
  • Identity-constraint definitions
  • Notation declarations
  • Annotations
  • Model groups
  • Particles
  • Wildcards
  • Attribute uses
  • The master schema component representing the schema as a whole.
  • Facets

The goal is to be able to name, for example, the literallayout notation in the DocBook schema, as well as every other significant piece of the schema. Neither qualified names nor URIs obviously solve this problem.


Ovidiu Predescu and Tony Addyman have released XSLT- process 2.2, an Emacs minor mode that can run an XSLT processor on a buffer and display the result in another buffer or in a browser. It also supports XSLT debugging mode. Java 1.3 or later is required.


Andy Clark has posted version 0.7.2 of the CyberNeko HTML Parser, a "simple HTML scanner and tag balancer that enables Java application programmers to parse HTML documents and access the information using standard XML interfaces. The parser can scan HTML files and 'fix up' many common mistakes that human (and computer) authors make in writing HTML documents. NekoHTML is written using the Xerces Native Interface (XNI) that is the foundation of the Xerces2 implementation." This release fixes various bugs.

Thursday, January 9, 2003

Daniel Veillard's released version 2.5.1 of libxml2, the GNOME XML parser for Linux. Version 2.5.1 plugs a memory leak introduced in 2.5.0 and


Christophe de Vienne's posted version 0.18 of libxml++, a C++ wrapper around the Gnome Project's libxml.


Peter J. Jones has posted hos own xmlwrapp 0.3.0, another C++ wrapper for libxml2. xmlwrapp is published under a Berkeley license.


Version 0.97 of Sablotron, an open source XML processor for C++ has been released. Sablotron supports XSLT 1.0, XPath 1.0, DOM Level 2, and some extension functions from EXSLT. Version 0.97 focuses on XSLT improvements including support for the xsl:import, xsl:strip-space, xsl:preserve-space, and esxlt:document elements and the unparsed-entity-uri() function. An XSLT debugger is also new in this release. Sablotron is dual licensed under the the Mozilla Public License 1.1 and the GNU General Public License (GPL). It should run on most modern Windows and Unixes.

Wednesday, January 8, 2003

Yesterday at MacWorld in San Francisco, Steve Jobs announced Safari, a new web browser Apple has developed for Mac OS X based on Konqueror's HTML rendering engine. It shows some original thought in UI design including better bookmark management, snapback for Google searches, and auto-cleanup of decompressed downloads (not so sure I like this one but you can probably turn it off). It is allegedly the fastest browser on Mac OS X (though Mozilla was conspicuously absent from the comparative benchmarks and charts Jobs displayed). Safari supports HTML, XHTML, DOM, CSS, JavaScript, QuickTime, Flash and Shockwave. However, in my initial tests XML does not seem to be supported with either CSS or XSLT stylesheets. SVG is also not supported. Java is supported. However, the initial reports are that it is extremely buggy and will crash the browser in short order. Safari is free-beer for Mac OS X. Other missing features include AutoFill Forms, ad blocking, tabbed browsing, full cookie control, and support for right-to-left languages like Hebrew.

Jobs also announced Keynote, a $99 payware presentation program sort of like PowerPoint. Notable for readers of this web site is that it saves its files in a native XML format. (It also imports and exports PowerPoint files, as well as PDF and many Graphics formats.)


The first beta of OpenOffice 1.0 for Mac OS X has been posted. This open source office suite includes a word processor, spreadsheet, presentation program, and draw program. On Mac OS X, it uses X-Windows for display. Future versions may use Aqua instead. OpenOffice saves its native files as gzipped XML. It can also read and write Microsoft Office documents. A final release is planned for the Spring.

Tuesday, January 7, 2003

Daniel Veillard's released version 2.5 of libxml2, the GNOME XML parser for Linux. Version 2.5 adds a new XmltextReader interface, based on the C# API, a new API to track node creation/deletion, and improved Python wrappers.

Monday, January 6, 2003

Fluxmedia and INRIA have released Transmorpher 1.0, "a software tool for defining and processing complex transformations of XML documents. It can accept external transformations (e.g., XSLT stylesheets) and provide a simple transformation language offering unit transformations (suppression, renaming, regular expression substitutions and query facilities). In addition to generating, transforming and serializing XML documents, it features constructors like merging, dispatching, querying, iterating, and composing transformations. These transformations can have several input and output streams. New implementation of these constructors can be plugged in Transmorpher. Transmorpher can be used as a compiler, an interpreter, a Ant task, a Servlet generator or embeded in another program."

Saturday, January 4, 2003

IBM's alphaWorks has updated their XML Registry/Repository, a data management system that provides services for XML artifacts including DTDs, schemas, stylesheets, and instance documents. Developers can use XRR to obtain an XML artifact automatically, search or browse for an XML artifact, deposit an XML artifact with or without related data, and register an XML artifact without deposit. Users can search for registered objects based on their metadata. "New features enable registering, storing, and managing of XML artifacts such as DTDs, schemas, and style sheets." This runs on AIX, Linux, Solaris, and Windows NT/2000

Friday, January 3, 2003

I've made a slight adjustment to my .htaccess file to try to fix some old broken links. Please let me know if you notice anything out of sorts here.

Thursday, January 2, 2003

Gerald Bauer has posted the first beta of Cypress, an open-source Cascading Style Sheet (CSS) parser. Cypress is published under the GPL.

Wednesday, January 1, 2003

Johannes Dobler's released version 1.4 of jd.xslt, an open source XSLT processor written in Java that supports most of the now defunct XSLT 1.1 working draft. This release fixes tail recursion (again), adds an Ant buildfile, upgrades from the old IBM version of the Bean Scripting Framework to the new Apache version, supports Pull Parsers, supports XInclude, and works better with large (> 20MB) documents.


News from 2002 | News from 2001 | News from 2000 | News from 1998 | News from 1999
[ XML Books | XML Trade Shows | XML Mailing Lists | XML Quotes ]

Copyright 2003 Elliotte Rusty Harold
elharo@ibiblio.org
Last Modified January 13, 2004