XML News in 2005

2005 XML News

Saturday, December 31, 2005 (Permalink)

Syntext has released Serna 2.4.0. a $268 payware XSL-based WYSIWYG XML Document Editor for Mac OS X, Windows, and Unix. Features include on-the-fly XSL-driven XML rendering and transformation, on-the-fly XML Schema validation, XInclude, and spell checking. Version 2.4 adds some MathML support. A roughly $500 enterprise edition adds a Python API and WebDAV support.

Cladonia Ltd.has released the Exchanger XML Lite 3.2, a free-as-in-beer XML editor written in Java that runs on most desktop platforms. Features include

Schema Based Editing
Tag Prompting
Validation against DTD, XML Schema, RelaxNG
XPath and Regular expression searches
Schema Conversion
XQuery
Database and Excel Import
XML Digital Signatures
DTD editing
XSLT Debugger

A $130 payware professional version adds diff and merge, and a grid view.

Friday, December 30, 2005 (Permalink)

The Mozilla Project has posted the second beta of Camino 1.0, a Mac OS X web browser based on the Gecko 1.8 rendering engine and the Quartz GUI toolkit. Version 1.0 mostly fixes bugs and speeds up the browser. This beta has several security fixes so all users should upgrade. Camino is free for Mac OS X 10.2 through 10.4. It supports pretty much all the technologies that Mozilla does: HTML, XHTML, CSS, XML, XSLT, etc. Mac OS X 10.2 or later is required.

Thursday, December 29, 2005 (Permalink)

The W3C CSS working group has posted the last call working drafts of Selectors. "This document describes the selectors that already exist in CSS1 and CSS2, and also proposes new selectors for CSS3 and other languages that may need them." New features include

XML namespace support
The tilde (~) is used as general sibling combinator. For example, hr ~ p matches a p element following an hr element.
Substring matching attribute selectors
:root, :nth-child(), :nth-last-child(), :nth-of-type(), :nth-last-of-type(), :first-child, :last-child, :first-of-type, :last-of-type, :only-child, :only-of-type, :empty, and negation pseudo-classes
::selection pseudo-element

Comments are due by January 16.

Wednesday, December 28, 2005 (Permalink)

The Eclipse Project has released Web Tools 1.0, a collection of XML and Web Services editors for the open source Eclipse IDE. I've not been incredibly impressed with this. As XML editors go, it's pretty weak, certainly not up to the level of Oxygen, for example. Both features and user interface don't seem to be closely tied to how XML is actually edited. Starting with the installation process, you feel like you're fighting with the tool rather than working with it. It's not obvious to me that the programmers working on this had a clear vision of how an XML (or web services) editor should work. Instead it looks like they simply made a checklist of features which were then implemented independently without any real integration between the people working on different pieces. I'd be extremely surprised if any actual user testing was done on this. The whole project makes the classic open source mistake of focusing on how the developers view the project instead of how users see it. Eclipse 3.1.1 is required, but frankly I wouldn't bother. This software is a turkey. Leave it on the farm.

Tuesday, December 27, 2005 (Permalink)

Wolfgang Hoschek has released NUX 1.4.1, an open source add-on package for XOM that connects it to Michael Kay's Saxon 8 XSLT 2/XPath 2/XQuery processor, the Sun Multi-Schema Validator, and the Apache Lucense fulltext search engine. It also provides thread-safe factories and pools for creating XOM Builder objects. NUX also includes yet another non-XML binary format. Version 1.4 updates the dependent libraries, improves XQuery performance, removes deprecated methods, and adds assorted utility methods here and there throughout the package. NUX is published under a modified BSD license (no advertising clause).

The XML Apache Project has posted the first beta of FOP 0.91, an open source XSL Formatting Objects to PDF/PostScript/RTF converter written in Java. Besides numerous bug fixes, this release adds:

SVG support in RTF output.
An "alternative set of rules for calculating text indents which tries to mimic the behaviour of many commercial FO implementations that chose to break the rules in the FO specification in order to better meet the natural expectations of inexperienced FO users."
Some "Overconstrained Geometry" rules
Support for leader painting in PostScript output.
hyphenation-ladder-count

Java 1.3 or later is required.

Saturday, December 24, 2005 (Permalink)

Dennis Sosnoski has released JiBX 1.0, yet another open source (BSD license) framework for binding XML data to Java objects using your own class structures. It falls into the custom-binding document camp as opposed to the schema driven binding frameworks like JaxMe and JAXB. Quoting from the JiBX web site,

JiBX is a framework for binding XML data to Java objects. It lets you work with data from XML documents using your own class structures. The JiBX framework handles all the details of converting your data to and from XML based on your instructions. JiBX is designed to perform the translation between internal data structures and XML with very high efficiency, but still allows you a high degree of control over the translation process.

How does it manage this? JiBX uses binding definition documents to define the rules for how your Java objects are converted to or from XML (the binding). At some point after you've compiled your source code into class files you execute the first part of the JiBX framework, the binding compiler. This compiler enhances binary class files produced by the Java compiler, adding code to handle converting instances of the classes to or from XML. After running the binding compiler you can continue the normal steps you take in assembling your application (such as building jar files, etc.). You can also skip the binding compiler as a separate step and instead bind classes directly at runtime, though this approach has some drawbacks.

The second part of the JiBX framework is the binding runtime. The enhanced class files generated by the binding compiler use this runtime component both for actually building objects from an XML input document (called unmarshalling, in data binding terms) and for generating an XML output document from objects (called marshalling). The runtime uses a parser implementing the XMLPull API for handling input documents, but is otherwise self-contained.

Friday, December 23, 2005 (Permalink)

The Mozilla Project has posted the first beta of SeaMonkey 1.0. This is the continuation of the integrated Mozilla suite, and has XML support roughly equivalent to Firefox 1.5 (e.g. XML, XSLT, CSS, XHTML, etc.) There's something to be said for having the e-mail client, web editor, browser, and more rolled into one application. However, there's little to be said for maintaining the same ugly user interface of the old Mozilla builds. I didn't realize it until I switched back after surfing with Firefox for some months, and then tried switching back; but there's more to Firefox than just a stripped down Mozilla. I can't quite put my finger on it, but Firefox just looks prettier than Mozilla/SeaMonkey does. It sounds trivial, but if you try using both I think you'll vastly prefer Firefox.

The Mozilla Project has also released Thunderbird 1.5. I've been using Thunderbird for the last few months as well. however I was just about to give up on it because an interaction of several different bugs in Thunderbird and IBiblio's overloaded IMAP server was forcing me to quit and relaunch Thunderbird every fifteen minutes or so. Fingers crossed, but the final 1.5 release seems to have fixed or at least drastically reduced those problems. I haven't seen them since upgrading two days ago. There are still some missing features, but overall Thunderbird seems to be the best mail client currently available on the Mac. It's vastly superior to Apple Mail, Outlook, and even the Eudora that I'd been using for ten years previously.

Thursday, December 22, 2005 (Permalink)

Sleepycat Software has released Berkeley DB XML 2.2.13, an open source "application-specific, embedded data manager for native XML data" based on Berkeley DB. It supports the April 2005 working drafts of XQuery 1.0 and XPath 2.0 (not the more recently released Candidate Recommendations). It includes C++, Java, Perl, Python, TCL and PHP APIs. This is a bug fix release.

The RSS feeds may be temporarily broken at various times today while I work on upgrading them. Update: it should all be fixed now. If you notice any remaining problems, please holler.

The OpenOffice Project has released OpenOffice 2.0.1, an open source office suite for Linux and Windows that saves all its files as zipped XML. According to Louis Suarez-Potts, "The main focus of the new release was correcting bugs, in particular in localisations. However, a number of new features were added as well. So, for example, it is now possible to disable and hide particular application settings, which comes in handy for central administration in networks. Moreover, a new keyboard shortcut permits the user to return to a saved cursor position. The bullets and numbering feature has been expanded, and a new mail merge feature is available. Last but not least, Macedonian has been added as an official language." OpenOffice is dual licensed under the LGPL and Sun Industry Standards Source License.

Wednesday, December 21, 2005 (Permalink)

The W3C XForms working group has posted the second public working draft of XForms 1.1. Changes since 1.0 include:

A new namespace URI, http://www.w3.org/2004/xforms/
power, luhn, current, choose, id and property XPath extension functions
An e-mail address datatype
An ID card number datatype
A prompt action element
An xforms-close event
An xforms-submit-serialize event
Inline rendering of non-text media types

The Helsinki University of Technology has released X-Smiles 0.96, a proof-of-concept XForms engine written in Java. It isn't very polished, but it does attempt to run on most platforms. This release adds zooming and fixes bugs, though a lot of work remains to be done.

Recordare has released Dolet 3.0 for Sibelius, a $129.95 payware plug-in for reading and writing MusicXML files. This release adds support for MusicXML 1.1. Upgrades are $79.95. Sibelius 4.0 or later is required.

Tuesday, December 20, 2005 (Permalink)

JAPISoft has released EditiX 4.3, a $99 payware XML editor written in Java. Features include XPath location and syntax error detection, context sensitive popups based on DTD, W3C XML Schema Language, and RelaxNG schemas, XSLT and XSL-FO previews, XInclude, XML catalogs, an XSLT debugger, DocBook support, and multi-view preview. Version 4.3 adds a few small features and fixes bugs. EditiX is available for Mac OS X, Linux, and Windows.

Monday, December 19, 2005 (Permalink)

Microsoft is officially halting distribution of Internet Explorer for the Mac at the end of next month. You'll probably want to archive a copy or two now for testing purposes, as well as to use with sites like the Proximus wireless "access" point at Javapolis last week that wouldn't let anybody in unless they were using IE. This had more than a little to do with the silence of my sites over the last week, as well as the paucity of reports from what was a quite interesting show for those who were there. Poor wireless access (the IE requirement was not the only problem) meant relatively few people were able to chat about or report from the show in real time.

Putting such blatantly bad design on display in front of an audience of 2000 alpha geeks, almost every one of whom could probably explain in intimate detail exactly what Proximus did wrong, is not exactly the smartest viral marketing a company might do. In fact, that's an idea. Next year let's do a reverse keynote where the CEO and CTO of Proximus have to stand in front of the convention and listen to everyone in the audience tell them how to fix their broken system. It used to be that only internal users suffered through such brain damage and poor design; but with web apps everyone gets to see just how incompetent your team really is. Hmm, there's another idea. How about a mutual fund that makes investment decisions by analyzing a company's public web applications to figure out which companies hire the pointy-haired and which don't?

Thursday, December 15, 2005 (Permalink)

I've posted the notes from my XOM Design Principles and Effective XML talks at Javapolis. The latter is a very different talk than the ones of the same name I've given in the past. This is a more basic talk focussing on introducing XML and best practices at the same time. This conference asked me to give a talk that could cover all levels from novice to advanced. I'm not sure that really worked. It's hard to go fast enough to interest the experienced users without leaving the beginners behind. I'm planning on giving a re-titled version of this talk at SD 2006 West in Santa Clara next March, but that will be focused more directly at beginners.

Sunday, December 11, 2005 (Permalink)

From the irony escapes the tech-unsavvy department, I note that Reporters sans Frontieres' Handbook for bloggers and cyber-dissidents attempts to set multiple cookies. Folks, if you're trying to teach people how to post anonymously, don't leave a trail of cookie crumbs behind. :-(

Saturday, December 10, 2005 (Permalink)

I spent today at the first Weekend with Experts conference in Pennsylvania. I've posted the notes from my presentations on XQuery and Effective XML.

Tomorrow I leave for Antwerp and Javapolis, so updates here will be infrequent until I return next week.

Friday, December 9, 2005 (Permalink)

The W3C Scalable Vector Graphics Working Group has posted a third last call working draft of Scalable Vector Graphics (SVG) Tiny 1.2. The changes are fairly detailed, but don't appear to be hugely substantive. Mostly they're clarifications and cleanups.

Thursday, December 8, 2005 (Permalink)

Code Synthesis has released xsd 1.7.0, an open source (GPL) W3C XML Schema language based data binding tool for C++. This release adds xsd:union support.

YesLogic has released Prince 5.1, a $349 payware batch formatter for Linux, Windows, and Mac OS X that produces PDF and PostScript from XML documents with CSS stylesheets. This release adds support for legacy HTML and transparent PNG and GIF images. Prince now passes the Acid2 test.

Wednesday, December 7, 2005 (Permalink)

Oleg Tkachenko has released nxslt 2.0, an open source (BSD license) Windows command line utility for accessing the .Net XSLT engine. Version 2.0 uses the faster XslCompiledTransform class and includes support for XInclude 1.0, more than EXSLT and EXSLT.NET extension functions, multiple output documents, embedded stylesheets, custom XmlResolvers and custom extension functions, and pretty printing.

Tuesday, December 6, 2005 (Permalink)

Guess what this document is:

In fact, it's the binary hash Microsoft Word made out of Chapter 24 in my next book. I'm not certain, but the proximate cause seems to be editing the file with both Office 97 on Windows and Office 2004 on the Mac. Fortunately I had a backup of this document from yesterday morning, and yesterday was more of a research and coding day than a writing day, so I didn't lose too much. Still if Microsoft can't even keep their file formats stable and reliable for one decade, why do they expect governments to use them for archival storage?

Monday, December 5, 2005 (Permalink)

The W3C XML Key Management Service (XKMS) Working Group has published a note on WSDL 1.1 description for XKMS. The group "has defined a Web Service to handle conventional PKI (public-key infrastructure) functions such as registration, revocation and status, as well as related functions such as retrieval. This note provides a sample Web Services Description Language (WSDL) 1.1 description for an XKMS service."

Sunday, December 4, 2005 (Permalink)

Cool discovery of the day: Bug Me Not now has a Firefox extension. Hit a site that pointlessly requires registration? Just right click in the user name box and select "Bug Me Not"; and it will fill in a probably valid user name and password! It's the easiest way to use Bug Me Not yet.

Saturday, December 3, 2005 (Permalink)

I've posted the initial notes for Syndication: RSS, ATOM, OPML, and All That. This session focuses on explaining RSS and ATOM to software developers (as opposed to content authors). In other words, it delves into the nitty gritty of how these systems work, and explains how to write software that generates and consumes feeds. I presented this lecture in my XML class at Polytechnic Thursday night. I was planning it for about 90 minutes, but I think it might need more like three hours to really do justice to the material. I didn't even get into the section on the Atom Publishing Protocol.

Hopefully I'll have an opportunity or two to develop this session further, and present it at some conferences and user groups over the coming year. Drop me an e-mail if you'd like me to talk to your user group, company, or conference about this. In fact, it occurs to me that there might be some interest in this at some conferences that aren't specifically focused on XML: e.g. web conferences, Ruby conferences, PHP conferences, etc. The technology is pretty important and fairly language and platform agnostic. if you hear of any Calls for Proposals for conferences that might be interested in a session or two on this subject, especially conferences that are willing to pay speakers, please drop me a line. Thanks.

Speaking of conferences, I've updated the conferences page with dates and info for Extreme Markup Languages 2006 (August 7-11 in Montreal) and XML 2006 (November 13-17 in Seattle). I probably won't be at Extreme this year. I'm thinking I may go to XML 2006 though. I'm up in the air about XTech 2006 (May 16-19, Amsterdam). These are all good shows, but they pay bupkus; so it's difficult to justify going to all three every year. I will definitely be at Software Development West 2006 in Santa Clara in March though.

Friday, December 2, 2005 (Permalink)

Opera Software has posted preview releases of version 9.0 of their namesake free-beer web browser for Windows, Unix, and the Mac. Most notably this release (finally!) adds support for XSLT. At least they say it does. In my tests, it couldn't render this XML page styled with XSLT. Other XMLish improvements include more accurate CSS rendering, Web Forms 2.0, Apple's canvas element, Atom 1.0, xml:id, removes support for XML namespaces in HTML documents, and can re-parse invalid XML documents as HTML after XML parsing has failed.

Thursday, December 1, 2005 (Permalink)

I'm pleased to announce the release of XOM 1.1, my free-as-in-speech (LGPL) dual streaming/tree-based API for processing XML with Java. Version 1.1 maintains backwards compatibility with XOM 1.0 while adding a number of important new features including XPath queries, document subset canonicalization, exclusive XML canonicalization, external XSLT parameters, and xml:id support. It also fixes a number of bugs that were present in XOM 1.0, uses less memory, and is two to four times faster for many common operations.

The addition of XPath is especially significant. It removes the last remaining reason you might plausibly choose JDOM or dom4j instead of XOM. Going forward I think you'll find that XOM is more robust, faster, smaller, better documented, and much, much easier to use than the alternatives. While there's a lot of working legacy code out there using JDOM or dom4j that no one's going to throw away, new projects should seriously consider XOM. In my not so humble opinion, it is demonstrably the best library of its type. There are still use cases for which one would choose a pure streaming API such as SAX, StAX, or XNI instead. However if you want an XML tree model in Java, XOM is the obvious choice.

Wednesday, November 30, 2005 (Permalink)

The Mozilla Project has released Firefox 1.5. New features in 1.5 for XML developers include:

CSS @-moz-document selector for matching on site/document URL, useful in user stylesheets
SVG support is now turned on by default.
XML Events in JavaScript
An extension to support XForms
CSS 2 quotes support
CSS 2 counters support
Lots of neat CSS 3 features
URIs are now always encoded in UTF-8
Partial support for E4X

Other new features in 1.5 include:

Sanitize "provides an easy way to quickly remove browsing history, cookies, cache, saved form information, and other personal data. The items to be removed can be customized, and the feature can be activated using either a keyboard shortcut or through a menu item."
When viewing images, tab icons now display thumbnails of the displayed image.
"Much faster session history navigation. The feature is off by default but can be enabled for testing purposes by setting the browser.sessionhistory.max_viewers preference to a nonzero number."
FTP users are prompted for a name and password if anonymous access fails.
Report a broken website wizard
Changes made in the Preferences window now apply immediately
Searchable download actions manager
Searchable cookie manager

I've been using 1.5 since the alpha 2 release a few months ago. It feels a little slower than 1.0 but otherwise has been quite stable. There do not appear to be any significant changes since the last release candidate.

Tuesday, November 29, 2005 (Permalink)

Todd Ditchendorf has released XML Nanny 1.3, a free-as-in-beer Mac OS X program that checks XHTML and XML documents for well-formedness and validity. Mac OS X 10.4 or later is required.

Version 2.0.2014 of Vienna, an open source RSS/Atom client for Mac OS X has been released. This is a bug fix release. This is the first reader I've found acceptable for daily use; not great bit good enough. (Of course my standards for "good enough" are pretty high.) I've also improved the experience a little by installing Feed Your Reader into Firefox so I can now add subscriptions to Vienna directly from Firefox.

Monday, November 28, 2005 (Permalink)

I've launched a new weblog, Mokka mit Schlag. This is going to be my site going forward for anything not specifically related to Java or XML. The RSS 2 and Atom feeds are full text for those of you who prefer to read in your news browsers. Comments are enabled on all posts. Initial entries cover a range of topics from SQL to birding to Disneyland to Software Testing to Ruby on Rails and combinations thereof. If you're curious as to why I'm doing yet anothe riste, you might want to read Why This Site? and or Addicted to Blogging. Enjoy!

Sunday, November 27, 2005 (Permalink)

The W3C Web Accessibility Initiative Protocols and Formats Working Group has published a note on the Inaccessibility of CAPTCHA: Alternatives to Visual Turing Tests on the Web. According to the note,

Web sites with resources that are attractive to aggregators (travel and event ticket sites, etc.) or other forms of automation (Web-based email, weblogs and message boards) have taken measures to ensure that they can offer their service to individual users without having their content harvested or otherwise exploited by Web robots.

The most popular solution at present is the use of graphical representations of text in registration or comment areas. The site attempts to verify that the user in question is in fact a human by requiring the user to read a distorted set of characters from a bitmapped image, then enter those characters into a form.

Researchers at Carnegie Mellon University have pioneered this method, which they have called CAPTCHA (Completely Automated Public Turing test to Tell Computers and Humans Apart) [CAPTCHA]. Various groups are at work on projects based on or similar to this original, and for the purpose of this paper, the term "CAPTCHA" is used to refer to all of these projects collectively. A Turing test [TURING], named after famed computer scientist Alan Turing, is any system of tests designed to differentiate a human from a computer.

This type of visual and textual verification comes at a huge price to users who are blind, visually impaired or dyslexic. Naturally, this image has no text equivalent accompanying it, as that would make it a giveaway to computerized systems. In many cases, these systems make it impossible for users with certain disabilities to create accounts, write comments, or make purchases on these sites, that is, CAPTCHAs fail to properly recognize users with disabilities as human.

Saturday, November 26, 2005 (Permalink)

The Apache XML Project has released XML Security v1.3, an implementation of security related XML standards including Canonical XML, XML Encryption, and XML Signature Syntax and Processing. A compatible Java Cryptography Extension provider is required. Version 1.3 improves performance and fixes bugs.

I've posted the seventh beta/first release candidate of XOM 1.1, my free-as-in-speech (LGPL) dual streaming/tree-based API for processing XML with Java. Version 1.1 maintains backwards compatibility with XOM 1.0 while adding a number of important new features including XPath queries, document subset canonicalization, exclusive XML canonicalization, external XSLT parameters, and xml:id support. The API is now considered to be stable, and probably won't change before 1.1 final. Beta 7 fixes two minor, almost cosmetic bugs in the Serializer and plugs a possible memory leak in the Builder. Barring discovery of any more bugs, this should be the last beta before the final release of XOM 1.1 next week. I'd really appreciate it if anyone who's been using it could give this release candidate a spin to make sure the latest fixes haven't broken anything else. XOM requires Java 1.2 or later and is published under the LGPL.

Friday, November 25, 2005 (Permalink)

Michael Kay has released version 6.5.5 of Saxon, an XSLT 1.0 processor written in Java. Saxon is open source under the Mozilla Public License. Java 1.2 or later is required. This release fixes a couple of bugs, but adds no new features.

Thursday, November 24, 2005 (Permalink)

Michael Kay has released version 8.6.1 of Saxon, his XSLT 2.0 and XQuery processor. This release adds a new saxon:deep-equal() extension function that "is similar to fn:deep-equal() but with an extra parameter to control the precise details of how the comparison is done. This was found useful as a means of comparing test results for the XQTS test suite with the published results." Assorted bugs are fixed as well. Saxon is published in two versions for both of which Java 1.4 or later is required. Saxon 8.6B is an open source product published under the Mozilla Public License 1.0 that "implements the 'basic' conformance level for XSLT 2.0 and XQuery." Saxon 8.6SA is a £250.00 payware version that "allows stylesheets and queries to import an XML Schema, to validate input and output trees against a schema, and to select elements and attributes based on their schema-defined type. Saxon-SA also incorporates a free-standard XML Schema validator. In addition Saxon-SA incorporates some advanced extensions not available in the Saxon-B product. These include a try/catch capability for catching dynamic errors, improved error diagnostics, support for higher-order functions, and additional facilities in XQuery including support for grouping, advanced regular expression analysis, and formatting of dates and numbers."

Wednesday, November 23, 2005 (Permalink)

The XML Apache has posted the first alpha of FOP 0.90, an open source XSL Formatting Objects to PDF/PostScript/RTF converter written in Java. This is the the first preview release after three years of redesign. New features include keeps on all implemented FO elements, reference-orientation, better indent behaviour, various improvements on inline elements like baseline-shift and improved leaders and image handling, and improved border painting. According to Jeremias Maerki, "This release is the first after a big redesign effort on the whole FOP codebase. This release is to be considered ALPHA quality and it is intended as a preview release encouraging people to take a look at the new version and to provide feedback to the developers. Please not only report to us problems you might experience but also tell us if it works for you. If you find out that this version works fine for you, you're welcome to use it but please test it thoroughly as we don't consider this release ready for every production environment." Java 1.3 or later is required.

YesLogic has released Prince 5.0 r4, a $349 payware batch formatter for Linux, Windows, and Mac OS X that produces PDF and PostScript from XML documents with CSS stylesheets. This is a bug fix release.

Tuesday, November 22, 2005 (Permalink)

I think I've finally found a decent RSS browser. Vienna is an open source client for Mac OS X that satisfies all my must have items, including newsfeed aggregation, OPML import and export, easy keyboard navigation, and more. There are no obvious bugs or user interface glitches. (OK. I found one minor glitch. It confirms deleting feeds rather than allowing the deletion to be undone. that's an extremely common mistake. More on that topic on another site soon. )

In one respect, it even exceeds my requirements. I had asked for a browser that hides all read news items, which it can do. However, it also lets me delete news items individually. Thus I can save the items I may want to reread or come back to while deleting most things so I never see them again. Very slick! It also has a significant AppleScript dictionary. It's hard to judge the quality of such a thing without actually writing a few scripts for it, but it looks better than most commercial products.

Vienna isn't written in Java, so I probably won't be able to contribute much back; but on the other hand I'm not sure I need to. It works pretty damn well out of the box.

Monday, November 21, 2005 (Permalink)

From the seriously silly version number department, Planamesa Software has released NeoOffice/J 1.1 Patch-3-With-Java-1.4.x-Update-1, a Mac port of the open source OpenOffice suite. This is a bug fix release. Mac OS X 10.2 or later is required. NeoOffice is published exclusively under the GPL.

Saturday, November 19, 2005 (Permalink)

The Apache Project has released Cocoon 2.1.8, an open source "web development framework built around the concepts of separation of concerns and component-based web development. Cocoon implements these concepts around the notion of 'component pipelines', each component on the pipeline specializing on a particular operation. This makes it possible to use a Lego(tm)-like approach in building web solutions, hooking together components into pipelines without any required programming." Cocoon can assemble data from many sources including filesystems, SQL databases, LDAP, native XML databases, and SAP. It can customize the output to generate HTML, WML, PDF, SVG, and RTF from the same inputs. Processes it supports include XSL transformation and XInclude resolution. Cocoon can run as a servlet inside an existing web server or standalone through a commandline interface. New features in 2.1.8 include:

Many enhancements to the forms block including AJAX support for partial updates to a form, a new tree widget, some experimental code for reusable form libraries (coded as a part of the Google Summer of Code project) and a sample showing how to create forms using relational databases with zero java code.
Cocoon Stack Traces
Many enhancements to the portal block, including improved caching mechanisms, support for the Web Services For Remote Portlets (WSRP) standard, and provided components for database access using OJB.
A new JCR block allowing access to JCR repositories such as JackRabbit
A new validation block providing the ability to validate XML in a pipeline chosing from a range of schema languages (DTD, XSD, RNG)
The ability to use Cocoon pipelines to render JSF pages (using the JSF controller)

Friday, November 18, 2005 (Permalink)

I've posted my slides from Wednesday's talk on Testing XML at STARWest.

Automatic Update just informed me that the Mozilla Project has posted the third release candidate of Firefox 1.5. If you've been using one of the 1.5 betas, it may already be downloading now as you read this. New features in 1.5 for XML developers include:

CSS @-moz-document selector for matching on site/document URL, useful in user stylesheets
SVG support is now turned on by default.
XML Events in JavaScript
An extension to support XForms
CSS 2 quotes support
CSS 2 counters support
Lots of neat CSS 3 features
URIs are now always encoded in UTF-8

Other new features in 1.5 include:

Sanitize "provides an easy way to quickly remove browsing history, cookies, cache, saved form information, and other personal data. The items to be removed can be customized, and the feature can be activated using either a keyboard shortcut or through a menu item."
When viewing images, tab icons now display thumbnails of the displayed image.
"Much faster session history navigation. The feature is off by default but can be enabled for testing purposes by setting the browser.sessionhistory.max_viewers preference to a nonzero number."
FTP users are prompted for a name and password if anonymous access fails.
Report a broken website wizard
Changes made in the Preferences window now apply immediately
Searchable download actions manager
Searchable cookie manager

I've been using this version since the alpha 2 release a few months ago. It feels a little slower than 1.0 but otherwise has been quite stable.

Thursday, November 17, 2005 (Permalink)

IDEAlliance has posted the Call for Papers for XTECH 2006 taking place in Amsterdam, May 16-19. XTech is the primary European XML show every year. January 9 is the deadline for submissions. It's a nice show, but I probably won't go this year. Maybe 2007.

The W3C Multimodal Interaction Working Group has posted the last call working draft of the Delivery Context: Interfaces (DCI) Accessing Static and Dynamic Properties (formerly "Dynamic Properties Framework"). "This document defines platform and language neutral interfaces that provide Web applications access to a hierarchy of dynamic properties representing device capabilities, configurations, user preferences and environmental conditions."

Friday, November 11, 2005 (Permalink)

The W3C Web Services Choreography Working Group has posted the candidate recommendation of Web Services Choreography Description Language Version 1.0. According to the abstract,

The Web Services Choreography Description Language (WS-CDL) is an XML-based language that describes peer-to-peer collaborations of participants by defining, from a global viewpoint, their common and complementary observable behavior; where ordered message exchanges result in accomplishing a common business goal.

The Web Services specifications offer a communication bridge between the heterogeneous computational environments used to develop and host applications. The future of E-Business applications requires the ability to perform long-lived, peer-to-peer collaborations between the participating services, within or across the trusted domains of an organization.

The Web Services Choreography specification is targeted for composing interoperable, peer-to-peer collaborations between any type of participant regardless of the supporting platform or programming model used by the implementation of the hosting environment.

Thursday, November 10, 2005 (Permalink)

I've posted the sixth beta of XOM 1.1, my free-as-in-speech (LGPL) dual streaming/tree-based API for processing XML with Java. Version 1.1 maintains backwards compatibility with XOM 1.0 while adding a number of important new features including XPath queries, document subset canonicalization, exclusive XML canonicalization, external XSLT parameters, and xml:id support. The API is now considered to be stable, and probably won't change before 1.1 final. Beta 6 fixes two bugs in the Canonicalizer and one bug in the Builder that could lead to malformed documents. Barring discovery of any more bugs, this should be the last beta before the final release of XOM 1.1. XOM requires Java 1.2 or later and is published under the LGPL.

Wednesday, November 9, 2005 (Permalink)

The W3C Semantic Web Best Practices and Deployment Working Group has posted updated working drafts on SKOS, the Simple Knowledge Organisation System. "SKOS Core provides a model for expressing the basic structure and content of concept schemes such as thesauri, classification schemes, subject heading lists, taxonomies, 'folksonomies', other types of controlled vocabulary, and also concept schemes embedded in glossaries and terminologies. The SKOS Core Vocabulary is an application of the Resource Description Framework (RDF), that can be used to express a concept scheme as an RDF graph. Using RDF allows data to be linked to and/or merged with other data, enabling data sources to be distributed across the web, but still be meaningfully composed and integrated." The SKOS Core Guide "is a guide using the SKOS Core Vocabulary, for readers who already have a basic understanding of RDF concepts." The SKOS Core Vocabulary Specification "gives a reference-style overview of the SKOS Core Vocabulary as it stands at the time of publication. It also describes the policies for ownership, naming, persistence and change by which the SKOS Core Vocabulary is managed."

The W3C Web Services Description Working Group, has posted the first public working draft of Web Services Description Language (WSDL) Version 2.0: RDF Mapping. "Web Services Description Language (WSDL) provides a model and an XML format for describing Web services. This document describes a representation of that model in the Resource Description Language (RDF) and in the Web Ontology Language (OWL), and a mapping procedure for transforming particular WSDL descriptions into their RDF form."

The W3C RDF Data Access Working Group has published the first working draft of SPARQL Protocol for RDF Using WSDL 1.1. "The RDF Data Access Working Group normatively defines the SPARQL Protocol for RDF via a Web Services Description Language version 2.0 (WSDL 2.0) definition. This document presents a non-normative WSDL 1.1 document defining the same protocol."

The W3C has published the first public working draft of Scope of Mobile Web Best Practices. "To help frame the development of 'best practices' for the mobile Web this document - created by the members of the Mobile Web Initiative Best Practices Working Group ( BPWG) as an elaboration of its charter - identifies the nature of problems to be solved, outlines the scope of work to be undertaken and specifies the assumptions regarding the target audience and the anticipated deliverables."

Tuesday, November 8, 2005 (Permalink)

Sleepycat Software has released Berkeley DB XML 2.2, an open source "application-specific, embedded data manager for native XML data" based on Berkeley DB. It supports the April 2005 working drafts of XQuery 1.0 and XPath 2.0 (not the more recently released Candidate Recommendations). It includes C++, Java, Perl, Python, TCL and PHP APIs. New features in 2.2 include:

Node level indexes to improve query performance for large XML documents
Query plan generation and indexing optimization
Improved resource utilization for node storage containers
New index lookup functions

Monday, November 7, 2005 (Permalink)

Benjamin Pasero has released of RSSOwl 1.2, an open source RSS reader written in Java and based on the SWT toolkit. Version 1.2 adds assorted new features, most importantly support for Atom 1.0. RSSOwl is the best open source RSS client I've seen written in Java. That said, it still doesn't feel right to me. Even ignoring various small user interface inconsistencies, news just doesn't flow in this client. The goal of an RSS news reader is to help you get through large quantities of information quickly. RSSOwl doesn't do that. The biggest problem is that it doesn't treat read and unread items differently. If I've already read a news item, I don't want it to keep showing up when I'm paging through the news with the arrow key. I can use Command-N (Ctrl-N for Windows/Linux folk) to go to the next unread item. However, that's not enough. the arrow key and the space bar should both advance to the next unread item, rather than the next item. Furthermore, previously read items should be hidden by default. Usenet news readers have acted like this for years. Even Thunderbird can do this (and that feature has been a huge help in organizing and responding to my email). Why can't RSS readers do the same? (Actually, some can but these are all either closed source, or operate on platforms I don't use.)

Sunday, November 6, 2005 (Permalink)

Worldwide browser market share for Firefox has now crossed 10%. In the United States it's as high as 14%, and among techies it's climbing toward 50%. If I were Microsoft I'd be very, very worried.

Friday, November 4, 2005 (Permalink)

The W3C XQuery and XSL working groups have published eight candidate recommendations for XQuery, XSLT 2 and XPath 2:

Two others have been updated but have not yet reached last call:

Michael Kay has released version 8.6 of Saxon, his XSLT 2.0 and XQuery processor. This release updates Saxon to support the latest candidate recommedations of XSLT 2 and XQuery. Saxon is published in two versions for both of which Java 1.4 or later is required. Saxon 8.6B is an open source product published under the Mozilla Public License 1.0 that "implements the 'basic' conformance level for XSLT 2.0 and XQuery." Saxon 8.6SA is a £250.00 payware version that "allows stylesheets and queries to import an XML Schema, to validate input and output trees against a schema, and to select elements and attributes based on their schema-defined type. Saxon-SA also incorporates a free-standard XML Schema validator. In addition Saxon-SA incorporates some advanced extensions not available in the Saxon-B product. These include a try/catch capability for catching dynamic errors, improved error diagnostics, support for higher-order functions, and additional facilities in XQuery including support for grouping, advanced regular expression analysis, and formatting of dates and numbers."

Toni Uusitalo has released Parsifal 1.0, a minimal, non-validating XML parser written in ANSI C. The API is based on SAX2. Parsifal is in the public domain.

Thursday, November 3, 2005 (Permalink)

Altsoft N.V. has released Xml2PDF 2.4, a $49 payware Windows program for converting XSL-FO, SVG, WordML, and XHTML documents into PDF files. New features in 2.4 include:

inside and outside floats;
extension for creating PDF layers based on z-indexes;
extension for adding files as an attachment to PDF document;
extension for adding PDF document properties;
SVG text paths;
WordML merge fields

Wednesday, November 2, 2005 (Permalink)

The W3C Internationalization Working Group has published a new working draft of Character Model for the World Wide Web 1.0: Fundamentals. This "provides authors of specifications, software developers, and content developers with a common reference for early uniform normalization and string identity matching to improve interoperable text manipulation on the World Wide Web....The main difference from previous versions of this document is that it no longer proposes to rely exclusively on Early Uniform Normalization."

Automatic Update just informed me that the Mozilla Project has released Firefox 1.5. (Looking at the web site it's actually Firefox 1.5 RC 1, not quite the final release version.) If you've been using one of the 1.5 betas, it may already be downloading now as you read this. New features in 1.5 for XML developers include:

CSS @-moz-document selector for matching on site/document URL, useful in user stylesheets
SVG support is now turned on by default.
XML Events in JavaScript
An extension to support XForms
CSS 2 quotes support
CSS 2 counters support
Lots of neat CSS 3 features
URIs are now always encoded in UTF-8

Other new features in 1.5 include:

Sanitize "provides an easy way to quickly remove browsing history, cookies, cache, saved form information, and other personal data. The items to be removed can be customized, and the feature can be activated using either a keyboard shortcut or through a menu item."
When viewing images, tab icons now display thumbnails of the displayed image.
"Much faster session history navigation. The feature is off by default but can be enabled for testing purposes by setting the browser.sessionhistory.max_viewers preference to a nonzero number."
FTP users are prompted for a name and password if anonymous access fails.
Report a broken website wizard
Changes made in the Preferences window now apply immediately
Searchable download actions manager
Searchable cookie manager

"This release does not contain any major new features since Beta 1. Improvements to automated update system, Web site rendering and performance, along with several security fixes are included in this release Beta 1 users that want to help test software update, should wait for the automatic update to be triggered sometime in the next few days. The incremental update from Beta 1 to Beta 2 is 700K bytes." I've been using this version since the alpha 2 release a few months ago. It feels a little slower than 1.0 but otherwise has been quite stable.

Monday, October 31, 2005 (Permalink)

The XML Apache Project has released Xalan-C++ 1.10, an open source XSLT processor written in standard C++. Version 1.10 (That's "one dot ten", not "one dot one dot zero.") adds support for XML 1.1 and Namespaces in XML 1.1, upgrades Xerces-C to version 2.7, and fixes assorted bugs.

The W3C has opened an XML Processing Model Working Group. According to the charter,

The goals of the XML Processing Model Working Group are to develop two Recommendation Track documents:

An XML Processing Language which answers the following questions:

What is to be done to a given document or a set of documents by a given sequence of given XML processors?

Which data model (XML Information Set, PSVI, XPath 1.0, XQuery 1.0 and XPath 2.0) is manipulated by each transformation process?

How are exceptions handled during processing?

What is the expected outcome after processing?

An XML Processing Model which answers the following questions:

Which if any of the transformations signalled by aspects of an XML document should be performed, and in what order? Examples of transformations include, but are not limited to, XInclude, XML Canonicalization (and/or Exclusive Canonicalization), XSLT, xml:id, XML Signature and XML De/Encryption.

How can an author, consumer, or application guide this process?

In the absence of any guidance, what default processing, if any, should be done in what circumstances?

What will the impact of a default processing model be on existing XML documents and processors, in particular DOM implementations?

It is also expected that the Working Group will take into consideration potential consequences of processing XML documents represented by alternative (e.g. efficient XML interchange) serializations, or bundled together as compound documents or with other packaging methods.

The Omni Group has released OmniWeb 5.1.2, a $29.95 payware web browser for Mac OS X. OmniWeb 5.x is based on the same KHTML engine Safari uses. However, it has less XML support than Safari 2.0. It can handle CSS but not XSLT.

Friday, October 21, 2005 (Permalink)

I'll be travelling for the next week. Updates here will be slow to nonexistent until next month.

The W3C Internationalization Core Working Group, XQuery Working Group, and XSL Working Group have jointly produced a note on Working with Time Zones. "This document discusses some of the problems encountered when working with the date, time, and dateTime values from [XML Schema] when those value include (or omit) time zone offsets."

The OpenOffice Project has released OpenOffice 2.0, an open source office suite for Linux and Windows that saves all its files as zipped XML. New features in 2.0 include a multipane view, custom shapes, enhanced database frontend, mail merge wizard, nested tables, digital signatures, XForms, and the ability to open and save WordPerfect files. OpenOffice is dual licensed under the LGPL and Sun Industry Standards Source License.

Thursday, October 20, 2005 (Permalink)

I've posted the fifth beta of XOM 1.1, my free-as-in-speech (LGPL) dual streaming/tree-based API for processing XML with Java. Version 1.1 maintains backwards compatibility with XOM 1.0 while adding a number of important new features including XPath queries, document subset canonicalization, exclusive XML canonicalization, external XSLT parameters, and xml:id support. The API is now considered to be stable, and probably won't change before 1.1 final. Beta 5 fixes one bug in the Canonicalizer and makes a couple more small optimizations. This may be the last beta before the final release of XOM 1.1. XOM requires Java 1.2 or later and is published under the LGPL.

Netscape has released version 8.0.4 of its namesake web browser for Windows. This is a bug fix release. All users should upgrade.

Wednesday, October 19, 2005 (Permalink)

The Mozilla Project has posted the second preview of its XForms extension for Firefox 1.5. Mozilla XForms support has been developed over the last year by IBM, Novell, and independent contributors.

The first alpha of FormFaces, a pure JavaScript XForms processor, has been posted. In FomrFaces, "JavaScript transcodes the XForms controls to HTML form controls and processes the binding directly within the browser - requiring ZERO server-side processing and ZERO plug-ins." Supported browsers inlcude recent versions of Internet Explorer, Netscape, Mozilla, FireFox, Opera, Konquerer, Safari, and NetFront. FormFaces is licensed under the GPL.

The xframe project has released xsddoc 1.0, an open source documentation generator for W3C XML Schemas based on XSLT. xsddoc generates JavaDoc-like documentation of schemas. Java 1.3 or later is required.

x-port.net has released of formsPlayer 1.3.5.1018, a free-beer (e-mail address required) XForms processor that "only works in Microsoft's Internet Explorer version 6 SP 1." This release adds an XPath function that allows a node to be tested for validity and the EXSLT evaluate function.

YesLogic has released Prince 5.0 r2, a $349 payware batch formatter for Linux, Windows, and Mac OS X that produces PDF and PostScript from XML documents with CSS stylesheets. This is a bug fix release.

Tuesday, October 18, 2005 (Permalink)

XimpleWare has released VTD-XML 1.0, a free (GPL) non-extractive Java library for processing XML that supports XPath. This appears to be an example of what Sam Wilmot calls "in situ parsing". In other words, rather than creating objects representing the content of an XML document, VTD-XML just passes pointers into the actual, real XML. (These are the abstract pointers of your data structures textbook, not C-style addresses in memory. In this cases the pointers are int indexes into the file.) YOu don't even need to hold the document in memory. It can remain on disk. This should improve speed and memory usage. Current tree models typically require at least 3 times the size of the actual document, more often more. Using a model based on indexes into one big array might allow these to reduce their requirements to twice the size of the original document or even less. VTD-XML claims 1.3 times, but I haven't verified that.

However VTD-XML currently "only supports built-in entity references(" & ' > <)." That means it's not an XML parser. Given this, comparisons to other parsers are unfair and misleading. I've seen many products that outperform real XML parsers by subsetting XML and cutting out the hard parts. It's often the last 10% that kills the performance. :-( The other question I have for anything claiming these speed gains is whether it correctly implements well-formedness testing, including the internal DTD subset. Will VTD-XML correctly report all malformed documents as malformed? Has it been tested against the W3C XML conformance test suite? I'm not sure.

Finally, even if everything works out once the holes are plugged, this seems like it would be slower than SAX/StAX for streaming use cases. VTD, like DOM, needs to read the entire document before it can work on any of it. SAX/StAX can begin processing the beginning of a document before most of the document has even arrived from the network. This isn't relevant to all use cases, but it's very relevant for many of the cases where speed is most critical and most problematic.

Monday, October 17, 2005 (Permalink)

IBM's alphaWorks has posted the XML Forms Generator, a "data-driven Eclipse plug-in generates forms that adhere to the XForms 1.0 standard, using as a starting point either Web Service Description Language (WSDL) documents or XML instance documents having optional XML Schema backing models. The generated forms adhere to the XHTML and XForms 1.0 standards and can be viewed in popular XHTML and XForms renderers." This tool is part of the Emerging Technologies Toolkit (ETTK), which is a nice way of saying this is closed source and more than likely IBM will eventually abandon it without ever making it available for production use; either as closed or open source.

Saturday, October 15, 2005 (Permalink)

The W3C Synchronized Multimedia Working Group has posted the proposed recommendation of the Synchronized Multimedia Integration Language (SMIL 2.1). SMIL 2.1 has four goals:

Define an XML-based language that allows authors to write interactive multimedia presentations. Using SMIL, an author can describe the temporal behaviour of a multimedia presentation, associate hyperlinks with media objects and describe the layout of the presentation on a screen.
Allow reusing of SMIL syntax and semantics in other XML-based languages, in particular those who need to represent timing and synchronization. For example, SMIL components are used for integrating timing into XHTML and into SVG.
Extend the functionalities contained in the SMIL 2.0 into new or revised SMIL 2.1 modules.
Define new SMIL 2.1 Mobile Profiles incorporating features useful within the mobile industry.

Changes since the candidate recommendation are mostly editorial.

Friday, October 14, 2005 (Permalink)

The W3C HTML Working Group has posted the second working draft of XFrames. Accopprding to the introduction,

Frames were introduced into HTML at version 4.0 [HTML4]. They introduced a manner of composing several HTML documents into a single view to create an application-like interface.

However, Frames introduced several usability problem that caused several commentators to advise Web site builders to avoid them at all costs. Examples of such usability problems are:

The [back] button works unintuitively in many cases.

You cannot bookmark a collection of documents in a frameset, or send someone a reference to the collection.

If you do a [reload], the result may be different to what you had.

[page up] and [page down] are often hard to do.

You can get trapped in a frameset.

Searching finds HTML pages, not Framed pages, so search results usually give you pages without the navigation context that they were intended to be in.

Since you can't content negotiate, noframes markup is necessary for user agents that don't support frames. However, almost no one produces noframes content, and so it ruins Web searches, since search engines are examples of user agents that do not support frames.

There are security problems caused by the fact that it is not visible to the user when different frames come from different sources.

This document defines a separate XML application, not a part of XHTML per se, that allows similar functionality to HTML Frames, with fewer usability problems, principally by making the content of the frameset visible in its URI.

In itself, this seems fine. However, it really feels like there's a lot of overlap with XLink, XInclude, XSL, and CSS. I think those technologies together could accomplish everything that's suggested here. If they can't, I'd really like to hear why they can't, and then ask whether it might be simpler to make a few small additions to those specs rather than inventing something completely new.

Wednesday, October 12, 2005 (Permalink)

I've updated the conferences page. I found one interesting new event, Programming Language Technologies for XML (PLAN-X) taking place January 14 in Charleston, South Carolina. This ACM sponsored workshop focuses on new programming language technologies for working with XML. I don't have anything to present to the workshop, but I'm tempted to attend anyway. If you know of any other XML-centric conferences I'm missing, please send in their info.

This Saturday I'll be at the first Weekend with Experts show in New York to talk about Effective XML. In November I'll talk about testing XML at both STPCon and STARWest. In December, the Weekend with Experts arrives in Philadelphia, where I'll talk about Effective XML one more time. In January, I'll be at the XML Developers Network of the Capital District in Albany on the 17th to talk about XOM; and on February 8, I'll be at the Capital District Java Developers Network, also in Albany, to talk about Measuring JUnit Code Coverage. Then in March it's back to Santa Clara for Software Development 2006 West. If you'd like me to talk to your user group, just send me an e-mail. I do ask that groups outside the New York City area cover my travel expenses, though sometimes we can piggy back a user group talk on top of a conference in the same general vicinity. See you there!

Tuesday, October 11, 2005 (Permalink)

buldocs has released xnsdoc 1.0, a €49 payware "documentation generator for XML namespaces defined by W3C XML Schema in HTML in a JavaDoc like visualization. xnsdoc supports all common schema design practices like chameleon, russian doll, salami slice, venetian blind schemas or circular schema references. xnsdoc can be used from the command line, as an Apache Ant Task, as an Apache Maven Plugin, as an eclipse plugin or integrated as a custom tool in many XML development tools such as StylusStudio, oXygen XML or XMLWriter."

Sunday, October 9, 2005 (Permalink)

Todd Ditchendorf has released Safari Guide 1.1, a free-as-in-beer Mac OS X application that can evaluate XPath and XQuery expressions against the current frontmost Safari webpage.

Saturday, October 8, 2005 (Permalink)

The Mozilla Project has posted the second beta of Firefox 1.5. The automatic update classifies this as Firefox 1.4.1. New features in 1.5 for XML developers include:

CSS @-moz-document selector for matching on site/document URL, useful in user stylesheets
SVG support is now turned on by default.
XML Events in JavaScript
An extension to support XForms
CSS 2 quotes support
CSS 2 counters support
Lots of neat CSS 3 features
URIs are now always encoded in UTF-8

Other new features in 1.5 include:

Sanitize "provides an easy way to quickly remove browsing history, cookies, cache, saved form information, and other personal data. The items to be removed can be customized, and the feature can be activated using either a keyboard shortcut or through a menu item."
When viewing images, tab icons now display thumbnails of the displayed image.
"Much faster session history navigation. The feature is off by default but can be enabled for testing purposes by setting the browser.sessionhistory.max_viewers preference to a nonzero number."
FTP users are prompted for a name and password if anonymous access fails.
Report a broken website wizard
Changes made in the Preferences window now apply immediately
Searchable download actions manager
Searchable cookie manager

Friday, October 7, 2005 (Permalink)

The W3C XForms Working Group has published a proposed edited recopmmendation of XForms 1.0. The changes are very minor overall. "There is one correction that affects conformance. Erratum E69a adds instance to the list of attributes for submission. Without this correction, the intended use of instance replacement is typically not achievable in practice. Furthermore, implementations already support this attribute. So this correction was added to align specification with implementations."

Thursday, October 6, 2005 (Permalink)

YesLogic has released Prince 5.0, a $349 payware batch formatter for Linux, Windows, and Mac OS X that produces PDF and PostScript from XML documents with CSS stylesheets. New features in 5.0 include Unicode, PDF links, bookmarks and security, footnotes, cross-references and CSS positioning. This beta adds support for the word-spacing, visibility, empty-cells, and clip CSS properties.

Tuesday, October 4, 2005 (Permalink)

Todd Ditchendorf has released XML Nanny 1.1, a free-as-in-beer Mac OS X "tool that provides an Aqua interface for checking XHTML and XML documents for Well-Formedness and Validity either locally or across the network." Version 1.1 adds support for W3C XML Schemas, improves error messages, and reports multiple errors in a single document. XML Nanny is based on Xerces-C.

Monday, October 3, 2005 (Permalink)

The W3C has opened public-schemata-users, a mailing list "for discussions between users/writers of schemata in any language (W3 XML Schema, RelaxNG, for example); in particular authors of modular and reusable schemata. Discussion of ways to combine schemata produced by different groups (such as NVDL), authoring best practices, and practical aspects such as level of support in different tools, are all on topic." Subscribe by sending a blank email to public-schemata-users-request@w3c.org.

Saturday, October 1, 2005 (Permalink)

The W3C Synchronized Multimedia Working Group has posted the proposed recommendation of the Synchronized Multimedia Integration Language (SMIL 2.1). SMIL 2.1 has four goals:

Define an XML-based language that allows authors to write interactive multimedia presentations. Using SMIL, an author can describe the temporal behaviour of a multimedia presentation, associate hyperlinks with media objects and describe the layout of the presentation on a screen.
Allow reusing of SMIL syntax and semantics in other XML-based languages, in particular those who need to represent timing and synchronization. For example, SMIL components are used for integrating timing into XHTML and into SVG.
Extend the functionalities contained in the SMIL 2.0 into new or revised SMIL 2.1 modules.
Define new SMIL 2.1 Mobile Profiles incorporating features useful within the mobile industry.

Differences from the candidate recommendation are claimed to be "clarifications but no major changes."

Friday, September 30, 2005 (Permalink)

I've posted the fourth beta release of XOM 1.1, my free-as-in-speech (LGPL) dual streaming/tree-based API for processing XML with Java. Version 1.1 maintains backwards compatibility with XOM 1.0 while adding a number of important new features including XPath queries, document subset canonicalization, exclusive XML canonicalization, external XSLT parameters, and xml:id support. The API is now considered to be stable, and probably won't change before 1.1 final. Beta 4 fixes a few bugs here and there, especially in SAX conversion. The fat version of the Text class is also working again. This may be the last beta before the final release of XOM 1.1. XOM requires Java 1.2 or later and is published under the LGPL.

Wednesday, September 28, 2005 (Permalink)

I've posted the notes from today's talk at SD Best Practices on Testing XML. I'll be repeating this one at STARWest in L.A. and STPCon in New York in November.

Tuesday, September 27, 2005 (Permalink)

The Modis Group has posted Sedna 0.5, an open source native XML database for Windows written in C++ and Scheme and published under the Apache License 2.0. Sedna has partial support for XQuery and its own declarative update language.

Monday, September 26, 2005 (Permalink)

This week I'm at Software Development Best Practices in Boston. I'll be talking about a number of topics including Effective XML, Testing XML, JUnit 4, Human Factors in API Design, Next Generation Web Clients, and GUI Testing with Abbot and Costello. Updates may be a little slow here in the meantime.

I've posted beta 8 of Jaxen 1.1, an open source (modified BSD license) XPath 1.0 engine for Java that is adaptable to many different object models including XOM, JDOM, DOM, and dom4j. Jaxen was originally written by James Strachan and Bob McWhirter. Beta 8 expands the JavaDoc, cleans up and optimizes various parts of the code base, and addresses a couple of areas where Jaxen wasn't correctly implementing the XPath specification. It also makes it a little easier not to include the Jaxen extension functions if you don't want to.

The only remaining known bugs involve namespace handling in the JDOM navigator. Don't be fooled by the "beta" designation. This release has many fewer bugs and is much more conformant to the XPath specification than the official 1.0 release. We'll probably get around to calling it 1.1 final sometime later this year after doing more work on testing, documentation, performance, and code cleanup. However, there's no reason to wait for that. If you're using Jaxen, you should upgrade to this beta.

Sunday, September 25, 2005 (Permalink)

The W3C Multimodal Interaction working group has posted the fifth public working draft of EMMA: Extensible MultiModal Annotation markup language. According to the abstract, this spec "provides details of an XML markup language for containing and annotating the interpretation of user input. Examples of interpretation of user input are a transcription into words of a raw signal, for instance derived from speech, pen or keystroke input, a set of attribute/value pairs describing their meaning, or a set of attribute/value pairs describing a gesture. The interpretation of the user's input is expected to be generated by signal interpretation processes, such as speech and ink recognition, semantic interpreters, and other types of processors for use by components that act on the user's inputs such as interaction managers."

Friday, September 23, 2005 (Permalink)

The W3C Quality Assurance Working Group has published a note on Test Metadata. According to the note,

To be truly useful, a test suite should consist of more than a simple collection of tests. Additional information is typically required to help users understand how to execute the tests and how to interpret the results. Much of this information should be provided in the test suite documentation but some is more appropriately provided in the form of metadata about the tests themselves. Well-defined metadata can help in:

tracking tests during the development and review process

filtering tests according to a variety of criteria (for example, whether or not they are applicable for a particular profile or optional feature)

identifying the area of the specification that is tested by the tests

constructing a test harness to automatically execute the tests

formatting test results so that they are easily understood

Most test suites provided by W3C Working Groups make use of some form of metadata. However, the extent of metadata usage and the forms and syntax in which metadata elements are defined varies widely from Group to Group. This document defines a minimal set of metadata elements that have proved useful in practice and attempts to standardize their names, syntax, and usage. If the use of standard metadata elements is adopted within the W3C it is likely that standardized tools will be developed to facilitate the tasks listed above.

Wednesday, September 21, 2005 (Permalink)

The Mozilla Project has posted the first alpha of Sea Monkey, "a community effort to deliver production-quality releases of code derived from the application formerly known as 'Mozilla Application Suite'. Whereas the main focus of the Mozilla Foundation is on Mozilla Firefox and Mozilla Thunderbird, our group of dedicated volunteers works to ensure that you can have 'everything but the kitchen sink'". XML support includes SVG, XSLT, CSS, and XHTML.

Opera Software has released version 8.5 of their namesake web browser for Windows, Linux, and the Mac. Opera is now free-beer: no ads, no license fee. Opera supports XML, XHTML, and CSS but not XSLT.

The Mozilla Project has released version 1.0.7 of Firefox, the open source web browser that is rapidly gaining on Internet Explorer. Firefox supports HTML, XHTML, CSS, XSLT, and simple XLinks. MathML and SVG aren't supported out of the box, but can be added. 1.0.7 plugs several security holes and fixes bugs. All users should upgrade.

Tuesday, September 20, 2005 (Permalink)

SyncroSoft has released version 6.2 of the <Oxygen/> XML editor. Oxygen supports XML, XSL, DTDs, XQuery, SVG, Relax NG, Schematron, and the W3C XML Schema Language. New features in 6.2 include dockable views, conditional breakpoints, XPath 2.0 Enabled Schematron, incremental search, context sensitive help, a UDDI registry browser, and spell check as you type. Oxygen costs $298 with support. Upgrades from 6.0 are free,

Monday, September 19, 2005 (Permalink)

The W3C XQuery working group has published eleven updated working drafts. Eight of them are in pre-candidate-recommendation stage and one is expected to be published as a note:

Two others have not yet reached last call:

Sunday, September 18, 2005 (Permalink)

The W3C RDF Data Access Working Group has published the last call working draft of SPARQL Protocol for RDF. "The RDF Query Language SPARQL expresses queries over RDF graphs. This document employs WSDL 2.0 to define a protocol for conveying those queries, as well as other operations, to an RDF query processing services and conveying the results of such queries and operations to the entity that requested them. This document also describes an RDF vocabulary for describing the capabilities and characteristics of RDF query processors."

Friday, September 16, 2005 (Permalink)

The Mozilla Project has posted the first alpha of Camino 1.0, a Mac OS X web browser based on the Gecko 1.8 rendering engine and the Quartz GUI toolkit. Version 1.0 mostly fixes bugs and speeds up the browser. Camino is free for Mac OS X 10.2 through 10.4. It supports pretty much all the technologies that Mozilla does: HTML, XHTML, CSS, XML, XSLT, etc. Mac OS X 10.2 or later is required.

The Omni Group has released OmniGraffle 4, a general purpose Mac OS X diagramming tool and my UML editor of choice. New features in 4.0 include Bezier curves, SVG export, and improved import and export to/from PICT and Visio. OmniGraffle ranges from $79.95 to $149.95.

Wednesday, September 14, 2005 (Permalink)

The OpenOffice Project has released OpenOffice 1.1.5, an open source office suite for Linux and Windows that saves all its files as zipped XML. "OpenOffice.org 1.1.5 introduces import support for documents, spreadsheets and presentations in OpenDocument format. The OpenDocument format is an XML based international office document standard approved by OASIS, the Organisation for the Advancement of Structured Information Standards. XML based, the OpenDocument format enables the free exchange of data between compliant software packages." OpenOffice is licensed under the LGPL.

Tuesday, September 13, 2005 (Permalink)

Ernst de Haan has posted xmlenc 0.49, an open source library for streaming XML output. It's marginally more convenient than System.out.println(). This release fixes bugs and adds test cases.

Monday, September 12, 2005 (Permalink)

I've posted the third beta release of XOM 1.1, my free-as-in-speech (LGPL) dual streaming/tree-based API for processing XML with Java. Version 1.1 maintains backwards compatibility with XOM 1.0 while adding a number of important new features including XPath queries, document subset canonicalization, exclusive XML canonicalization, external XSLT parameters, and xml:id support. The API is now considered to be stable, and probably won't change before 1.1 final. Beta 3 focuses on performance. This release is measurably faster than beta 2 for many common operations, and probably at least twice as fast as XOM 1.0. This probably the penultimate beta before the final release of XOM 1.1. XOM requires Java 1.2 or later and is published under the LGPL.

Sunday, September 11, 2005 (Permalink)

Engage Interactive has released DOMIT! 1.1, a free-as-in-speech (LGPL) DOM implementation for PHP. Version 1.1 changes error handling and logging including a default mode where DOMIT no longer dies on an error. I'm not quite sure what that means, but it sounds like it might be incompatible with XML. Draconian error handling is a feature, not a bug.

Saturday, September 10, 2005 (Permalink)

The W3C XML Core Working Group has released the final recommendation of xml:id Version 1.0. This spec defines an xml:id attribute that should always be recognized as an ID, regardless of the presence or absence of a DTD or schema. There've been no substantive changes in the spec since the propose recommendation a few moths ago, just editorial cleanups.

Unfortunately, this scheme is pretty badly incompatible with canonical XML, which likes to inherit attributes in the XML namespace onto descendant elements, thus moving xml:id's from one element to another. This has downstream effects on XML digital signatures and XML encryption. Canonical XML should not have assumed that all attributes in the XML namespace would act like xml:lang and xml:space; but it did; and canonical XML is now a four-year old deployed recommendation; so now we're stuck with another messy inconsistency between specs. At some point, we're really going to need to go through all of these and sand down these rough edges to produce XML 2.0.

It will take a while for software to catch up to this. XOM 1.1 is one of the few libraries that does already recognize xml:id. xml:id was the last blocker before final release. Now that the final version of xml:id is out, expect a release candidate of XOM 1.1 in a few days. (There are also a couple of inconsistencies with Xerces-J 2.7.1 I have to work out.)

Friday, September 9, 2005 (Permalink)

The Mozilla Project has posted the first beta of Firefox 1.5, formerly known as Firefox 1.1. New features in 1.5 for XML developers include:

CSS @-moz-document selector for matching on site/document URL, useful in user stylesheets
SVG support is now turned on by default.
XML Events in JavaScript
An extension to support XForms
CSS 2 quotes support
CSS 2 counters support
Lots of neat CSS 3 features
URIs are now always encoded in UTF-8

Other new features in 1.5 include:

Sanitize "provides an easy way to quickly remove browsing history, cookies, cache, saved form information, and other personal data. The items to be removed can be customized, and the feature can be activated using either a keyboard shortcut or through a menu item."
When viewing images, tab icons now display thumbnails of the displayed image.
"Much faster session history navigation. The feature is off by default but can be enabled for testing purposes by setting the browser.sessionhistory.max_viewers preference to a nonzero number."
FTP users are prompted for a name and password if anonymous access fails.
Report a broken website wizard
Changes made in the Preferences window now apply immediately
Searchable download actions manager
Searchable cookie manager

Beta 1 mostly focuses on bug fixes and performance. However, there are a new features since alpha 2 including SVG events JavaScript 1.6 which includes ECMAScript for XML. I've been using alpha 2 for a few months now. It feels a little slower than 1.0 but otherwise has been quite stable.

Thursday, September 8, 2005 (Permalink)

Peter Flynn has published version 4.3 of the XML FAQ. Updates in this version include:

What is XML *for*?
More on the Prolog
Minor corrections to details of parsing and validation
More examples of use of CDATA
Added details of reserved words to section on special characters
New section on conditional XML
New section on Infrequently Asked Questions (glossary of oddments that occasionally get asked about)

Wednesday, September 7, 2005 (Permalink)

The W3C Quality Assurance Working Group has updated The QA Handbook, "a non-normative handbook about the process and operational aspects of certain quality assurance practices of W3C's Working Groups, with particular focus on testability and test topics. It is intended for Working Group chairs and team contacts. It aims to help them to avoid known pitfalls and benefit from experiences gathered from the W3C Working Groups themselves. It provides techniques, tools, and templates that should facilitate and accelerate their work....This version harmonizes with the final document set of the QA Framework, including the new Test Development FAQ, and reflects some basic editorial cleanup."

Tuesday, September 6, 2005 (Permalink)

Code Synthesis has released xsd 1.1.1, an open source (GPL) W3C XML Schema language based data binding tool for C++. This is a bug fix release.

Daniel Veillard has released version 2.6.21 of libxml2, the open source XML C library for Gnome. He's also released version 1.1.15 of libxslt, the Gnome Project's XSLT library for C. These releases fix assorted bugs and may run a little faster.

Toni Uusitalo has posted Parsifal 0.9.3, a minimal, non-validating XML parser written in ANSI C. The API is based on SAX2. 0.9.3 can treat validation errors as non-fatal and adds the xmlplint command line tool. Parsifal is in the public domain.

Diomidis Spinellis has released bib2xhtml 2.20, a free-as-in-speech (GPL) program that converts BibTeX files into XHTML 1.0+CSS.

Todd Ditchendorf has released XML Nanny 1.0, a free-as-in-beer Mac OS X "tool that provides an Aqua interface for checking XHTML and XML documents for Well-Formedness and Validity either locally or across the network."

Monday, September 5, 2005 (Permalink)

The W3C Web Services Description Working Group has published a note on Discussion of Alternative Schema Languages and Type System Support in WSDL 2.0. "This document captures the result of discussions by the Web Services Description Working Group regarding WSDL 2.0 type system extensibilty at the time of its publication. The Working Group normatively defines the use of XML Schema 1.0 as a type system in the WSDL 2.0 Core specification. This document sketches out the basics of extensions for Document Type Definitions (DTDs) and Relax NG."

The W3C has published the first working draft of Scope of Mobile Web Best Practices. According to the abstract,

Web access from mobile devices suffers from problems that make the Web unattractive for most mobile users. W3C's Mobile Web Initiative (MWI ) proposes to address these issues through a concerted effort of key players in the mobile value chain, including authoring tool vendors, content providers, handset manufacturers, browser vendors and mobile operators.

To help frame the development of "best practices" for the mobile Web this document - created by the members of the Mobile Web Initiative Best Practices Working Group (BPWG) as an elaboration of its charter - identifies the nature of problems to be solved, outlines the scope of work to be undertaken and specifies the assumptions regarding the target audience and the anticipated deliverables.

The W3C Quality Assurance Working Group has posted the fourth working draft of Variability in Specifications. "This document analyzes how design decisions of a specification's conformance model may affect its implementability and the interoperability of its implementations. To do so, it introduces the concept of variability - how much implementations conforming to a given specification may vary among themselves - and presents a set of well-known dimensions of variability. Its goal is to raise awareness of the potential cost that some benign-looking decisions may have on interoperability and to provide guidance on how to avoid these pitfalls by better understanding the mechanisms induced by variability."

Sunday, September 4, 2005 (Permalink)

Kudos to Sun for beginning to fight the scourge of license proliferation. Effective immediately, Sun is retiring the Sun Industry Standards Source License (SISSL) under which OpenOffice has previously been published. According to Louis Suarez-Potts,

How does this move affect OpenOffice.org? As most know, OpenOffice.org code was launched under the dual banner of the SISSL and LGPL; licensees could choose which one they wanted to use, and nearly all have chosen the LGPL. Effective with the announcement that Sun is retiring the SISSL, however, OpenOffice.org will in the future only be licensed under the LGPL.

For users, the simplification means: no change. OpenOffice.org remains free to use, distribute, even sell. One can freely use it in commercial as well as government environments; nothing has changed.

For vendors, distributors, add-on and plug-in writers of OpenOffice.org: The LGPL allows for commercial distribution without affecting derived products in the same way as the GPL.

For developers and other contributors: As the code will be licensed only under the LGPL, modifications to the source must be published. (The SISSL did not require all changes to the source to be published.) As most OpenOffice.org contributors are already openly contributing to the community, we anticipate no problems. And for those who have been using the SISSL exclusively, we invite you to join us.

Friday, September 2, 2005 (Permalink)

IBM developerWorks has published my latest article, Encode your XML documents in UTF-8. In this article inspired by Google's Sitemaps service, I explain why I think it's time to stop bothering with other encodings and just choose UTF-8 once and for all. Update: After reading the article, Alex Blewitt raised a bug in Eclipse about this. If you agree that text files should be UTF-8 by default, please vote for it.

The Apache XML Project has released version 2.7.0 of Xerces-C, an open source schema validating XML parser written in reasonably cross-platform C++. Version 2.7.0 includes a number of small improvements:

Option to not generate XML Schema annotations to avoid memory bloat
Option to not perform default entity resolution when the entityResolver returns NULL.
Option to do schema-only validation even when there's a DTD.

Thursday, September 1, 2005 (Permalink)

The OpenOffice Project has posted the second beta of OpenOffice 2.0, an open source office suite for Linux and Windows that saves all its files as zipped XML. New features in 2.0 include a multipane view, custom shapes, enhanced database frontend, mail merge wizard, nested tables, digital signatures, XForms, and the ability to open and save WordPerfect files. OpenOffice is dual licensed under the LGPL and Sun Industry Standards Source License.

Tuesday, August 30, 2005 (Permalink)

Apple has released Safari 2.0.1, a web browser for Mac OS X based on the KHTML rendering engine. Safari supports direct display of XML documents with CSS and XSLT stylesheets. This release "improves website compatibility, application stability and support for 3rd party web applications." Mac OS X 10.4.2 is required.

Monday, August 29, 2005 (Permalink)

I've updated the conferences page. They're only about five relevant conferences I know of over the next year, though I suspect they're at least two more coming next summer that just haven't been officially scheduled yet.

The page is now written completely in XML and styled with XSLT. If you notice any weirdnesses in the page, holler. The old HTML version is still available; but it's now statically generated once a day from the XML version. (It's not like I change the page more often than that, and cron is just easier to set up than Cocoon or something fancier. The RSS feeds have been generated out of cron for a couple of years now, and that seems to work well.)

XML is eight years old now, and I really think it's time to start experimenting with real XML served directly to clients for a change. I'm also hopeful that it will be easier to edit and update the page than it was to maintain the old HTML page. A lot of the internal links, duplicate content, and old data can be managed automatically by XSLT so I actually have to type quite a bit less to enter a new show. Semantic markup is not just better for machines than presentational markup. Done right, it's also easier for humans to author.

Sunday, August 28, 2005 (Permalink)

Michael Kay has released version 8.5.1 of Saxon, his XSLT 2.0 and XQuery processor. Saxon is published in two versions for both of which Java 1.4 or later is required. Saxon 8.5.1B is an open source product published under the Mozilla Public License 1.0 that "implements the 'basic' conformance level for XSLT 2.0 and XQuery." Saxon 8.5.1SA is a £250.00 payware version that "allows stylesheets and queries to import an XML Schema, to validate input and output trees against a schema, and to select elements and attributes based on their schema-defined type. Saxon-SA also incorporates a free-standard XML Schema validator. In addition Saxon-SA incorporates some advanced extensions not available in the Saxon-B product. These include a try/catch capability for catching dynamic errors, improved error diagnostics, support for higher-order functions, and additional facilities in XQuery including support for grouping, advanced regular expression analysis, and formatting of dates and numbers." Version 8.5.1 focuses on bug fixes and optimizations.

Saturday, August 27, 2005 (Permalink)

Sun has posted version 0.3.5 of xmlroff, an open source XSL Formatting Objects to PDF and PostScript converter. xmlroff is written in C for Linux, and relies on the libxml2, libxslt, and the GLib, GObject and Pango libraries from GTK+ and GNOME (though neither GTK+ nor Gnome is required). It also needs PDFlib, FreeType2, and Fontconfig. xmlroff can be run from the command line. It also includes a libfo library. This release adds support for the fo:external-graphic element.

Friday, August 26, 2005 (Permalink)

The W3C Web Services Description Working Group has published a note describing their Discussion of Alternative Schema Languages and Type System Support in WSDL 2.0. "The Working Group normatively defines the use of XML Schema 1.0 as a type system in the WSDL 2.0 Core specification. This document sketches out the basics of extensions for Document Type Definitions (DTDs) and Relax NG."

Thursday, August 25, 2005 (Permalink)

The Fall conference season is upon us, and I am racing to finish my notes for four different shows that all have their deadlines at the end of this month. The first show is a new one, EclipseWorld, at the Roosevelt Hotel in Manhattan Monday-Wednesday next week. (August 29-31). The show seems to cover pretty much everything Eclipse related from RCP to SWT to TTPP to WST to ABCDEFG to WXYZ. I'll be delivering four sessions on XML Editing in Eclipse, Macifying SWT, Test Driven Development with Eclipse, and Static Code Analysis in Eclipse.

The next show I'll be at is Software Development Best Practices in Boston (September 26-29). As well as hosting a panel on next generation client side technologies I'll be talking about Effective XML, GUI Testing, JUnit 4, Testing XML, and User Interface Principles in API Design. The show is looking for a few more volunteers to man doors, distribute notes, and similar tasks. For each day you volunteer, you get to attend the conference for a day free; and most volunteer days involve nothing more strenuous than sitting in the back of the room listening to the presentation, and collecting eval forms at the end; so really, it's a nice way to attend the show for free.

Then it's back to New York October 15-16 for the kickoff event in a new Weekend with Experts series devoted to J2EE. I'm not much of a J2EE person, but I'll fake it by talking about Effective XML.

Staying in New York (I'm really glad to see so many shows coming to New York) November 1-3 will be the Software Test & Performance Conference at the Roosevelt Hotel. Here I'll talk about Testing XML and Measuring JUnit Code Coverage.

Then I'll be travelling to Los Angeles for the first time November 14-18 for STARWest 2005. I'll be teaching a one-day intro tutorial to JUnit designed for testers, and then talking about Testing XML again.

In December, there's a tentative trip planned to Antwerp, Belgium for Javapolis. Details still remain to be worked out but I'll probably be talking about XML related subjects.

Moving even further out, next year I'll definitely be at Software Development 2006 West, March 13-17, 2006, in Santa Clara once again. I'm on the advisory board for that show and we're just starting to pick next year's sessions, but it looks like it's going to be really hot. In particular, it looks like there'll be a lot of AJAX, scripting, and web app content in addition to the usual batch of Java, C++, .NET, XML, and Testing. Plus I'm expecting to host the world's first ever 6:00 A.M. BoF.

If there are any other shows you'd like to see me at, just drop me a line or ask the show organizers to recruit me. (Believe it or not they really do read the evaluation forms you fill out at the end of every session, and if you ask for particular speakers, they will notice.) I also talk to user groups if travel expenses can be covered and they're not too far from New York City, or if they can be scheduled in conjunction with a show I'm already attending. (Contrary to popular belief, publishers do not have unlimited budgets for author tours. In fact, for midlist computer book authors, they tend to have no such budget at all. :-) ) I also do occasional corporate training on XML, test driven development, and Java.

Laid out like that, it looks like quite a lot. It always seems more doable when I'm signing up for these shows and don't have four simultaneous deadlines staring down my throat. :-) But it's usually a good time, and I hope to meet some of you there.

Wednesday, August 24, 2005 (Permalink)

Ian E. Gorman has released GXParse 1.9, a free (LGPL) Java library that sits on top of a SAX parser and provides semi-random access to the XML document. The documentation isn't very clear, but as near as I can tell, it buffers various constructs like elements until their end is seen, rather than dumping pieces on you immediately like SAX does. This release simplifies regular expression based pattern matching.

Tuesday, August 23, 2005 (Permalink)

The W3C Web Services Addressing Working Group has posted the candidate recommendations of Web Services Addressing 1.0 Core and Web Services Addressing - SOAP Binding. The core spec defines abstract generic extensions to the Infoset for endpoint references and message addressing properties. The binding spec describes how the abstract properties defined in the core spec is implemented in SOAP. These specs are seeing some serious pushback within the W3C. The problem is that there already is an addressing system for the Web. It's called the URI, and it's not at all clear that web services addressing does anything beyond URIs do except add complexity. In fact, it's pretty clear that it doesn't do anything except add complexity.

Here's the problem. Web Services Addressing "defines two constructs, message addressing properties and endpoint references, that normalize the information typically provided by transport protocols and messaging systems in a way that is independent of any particular transport or messaging system." In other words this is another example of the excessive genericity problem, just like DOM, and remember how well that worked. One of the big fundamental problems with DOM was that they tried to develop an architecture that could work for all conceivable programming languages; but developers don't want and don't need an API for all programming languages. they want an API that's tailored to their own programming language. This is why language-specific libraries like XOM and Amara are so much easier to use and more productive than DOM.

Web Services Addressing is trying to define an addressing scheme that can work over HTTP, SMTP, FTP, and any other protocol you can imagine. However, each of these protocols already have their own addressing systems. Developers working with these protocols don't want and don't need a different addressing system that's marginally more compatible with some protocol they're not using in exchange for substantially less compatibility with the protocol they are using. Besides nobody's actually doing web services over anything except HTTP anyway. Doesn't it just make more sense to use the well understood, already implemented debugged HTTP architecture for this instead of inventing something new?

Monday, August 22, 2005 (Permalink)

The W3C Quality Assurance Working Group has posted the final recommendation of the QA Framework: Specification Guidelines. According to the abstract, "The goal of this document is to help W3C editors write better specifications, by making a specification easier to interpret without ambiguity and clearer as to what is required in order to conform. It focuses on how to define and specify conformance. It also addresses how a specification might allow variation among conforming implementations. The document presents guidelines or requirements, supplemented with good practices, examples and techniques."

Sunday, August 21, 2005 (Permalink)

The W3C Web Services Description Working Group has posted four second last call working drafts of WSDL 2.0:

Comments are due by September 19.

Friday, August 19, 2005 (Permalink)

x-port.net has released of formsPlayer 1.3.5, a free-beer (e-mail address required) XForms processor that "only works in Microsoft's Internet Explorer version 6 SP 1." This release fixes bugs and improves performance.

Thursday, August 18, 2005 (Permalink)

As I read more and more blogs on a variety of subjects, (and not just blogs but other sites too) there is one consistent mistake I keep seeing again and again and it's becoming more common: lack of e-mail addresses. This is killing so many sites, and they don't even know it. If you're running a customer focused site, failure to include your contact e-mail in the usual place is customer-hostile. Guess what? I don't care about your spam paranoia. That's your problem, not mine. I certainly don't want to give my business to a company that doesn't even have the competence to set up a spam filter. Nor do I care about making sure that mail is properly categorized and routed because I'm forced to select the subject from a popup menu. If you want to route the e-mail, then hire a human being to do it on your end. Don't waste my time with that. I'll probably do it wrong anyway. When I see a site with no e-mail address, I know this is a company that is trying to keep their customers away from them. I'll go elsewhere.

The same goes for personal sites. Do you want readers to tell you about your mistakes so you can fix them, or do you want to leave them in public view on your web site for all the world to see? Do you want people to volunteer to help with your open source projects? Do you want people to send you job offers and consulting opportunities? Do you want people to offer you money and gifts? Believe it or not, people do all these things; and the people who do this are a highly self-selected bunch that you want to talk to; but if they can't find your e-mail address in the first 20 seconds or so of looking, it won't happen. On personal sites, your name and email address should be clearly visible on nearly every page you write. On corporate and organizational sites, some general email address that a person reads and forwards should be there too, but you probably won't lost too many customers if you have a link to a contacts page instead. But that's it. Make it any harder to find your email address and you're pissing away potential customers, partners, contributors, friends, investors, and other people you want to talk to. Sure, publishing your e-mail address means you'll get a few more obvious Nigerian scams and crank mail from Christian gun nuts; but it's about 10,000 times easier to delete these than it is to locate a new customer/partner/investor/friend/contributor/donor/etc. Removing your e-mail addresses from a web site is like shooting your dog to get rid of a few fleas. Now I'll get some angry e-mail from a few PETA nuts; but I can deal with that. :-)

Wolfgang Hoschek has released NUX 1.3, an open source add-on package for XOM that connects it to Michael Kay's Saxon 8 XSLT 2/XPath 2/XQuery processor, the Sun Multi-Schema Validator, and the Apache Lucene fulltext search engine. It also provides thread-safe factories and pools for creating XOM Builder objects. NUX also includes yet another non-XML binary format. Version 1.3 updates the dependent libraries, improves performance, and adds assorted utility methods here and there throughout the package. NUX is published under a modified BSD license (no advertising clause).

Wednesday, August 17, 2005 (Permalink)

A minor annoyance I keep noticing on otherwise well-designed sites including FreshDirect, SpeakEasy, and Amazon. Why can't anyone forget credit cards that have obviously expired? I understand that you may need to store the old data for purposes of verifying old purchases (though this data should probably be purged after six months or some such) but why do you keep asking me to pay with a credit card that expired two years ago? I haven't used that card for months. I've made multiple payments on other cards. Why can't you update the default card? This is just plain stupid.

The W3C SVG and CSS Working Groups have posted the fourth public working draft SVG's XML Binding Language (sXBL). sXBL works like literal result element used as stylesheet is XSLT. That is, an sXBL document is an SVG document that can contain content from other namespaces. This SVG document specifies bindings between elements in those namespaces and particular SVG shapes. When an SVG processor renders the complete document, it replaces the content from other namespaces with their SVG bindings. This would be more useful if it were also possible to have external sXBL documents (more like traditional stylesheets) that don't require the source document and the SVG to be together in one place. Perhaps this will come in sXBL 2 some time down the road. Ultimately, this would allow browsers to render XML documents that don't look remotely like text, such as MathML and MusicXML. If you're curious about this, you might be interested in An early look at sXBL I wrote for IBM developerWorks.

Tuesday, August 16, 2005 (Permalink)

Engage Interactive has released DOMIT! 1.0, a free-as-in-speech (LGPL) DOM implementation for PHP. Version 1.0 improves adds experimental XPath support.

Recordare has released Dolet 3.0, a $119.95 payware Mac OS X Finale plug-in for reading and writing MusicXML files. This release adds support for MusicXML 1.1. Finale 2004 or later is required.

Monday, August 15, 2005 (Permalink)

The Apache Web Services Project has posted version 0.5 of JaxMe 2, an open source implementation of the Java API for XML Binding. Quoting from the web page,

JaxMe 2 is an open source implementation of JAXB, the specification for Java/XML binding.

A Java/XML binding compiler takes as input a schema description (in most cases an XML schema but it may be a DTD, a RelaxNG schema, a Java class inspected via reflection or a database schema). The output is a set of Java classes:

A Java bean class compatible with the schema description. (If the schema was obtained via Java reflection, then the original Java bean class.)

An unmarshaller that converts a conforming XML document into the equivalent Java bean.

Vice versa, a marshaller that converts the Java bean back into the original XML document.

In the case of JaxMe, the generated classes may also

Store the Java bean into a database. Preferably an XML database like eXist, Xindice, or Tamino, but it may also be a relational database like MySQL. (If the schema is sufficiently simple. :-)

Query the database for bean instances.

Implement an EJB entity or session bean with the same abilities.

Version 0.5 maps xs:extension to Java inheritance and adds support for mixed content. Excuse me? You mean it didn't support mixed content before now? No matter how many times it happens, I remain amazed when I see products advertise support for basic, sine qua non parts of XML like mixed content or Unicode as new features in the latest version. There are certain companies I've learned to expect this from (and whose press releases I'm very skeptical of as a result) but normally Apache is better than this. OK, this is still only an 0.5 release; and the last version was 0.4 so it's not really all that bad. Nonetheless I think if your tool doesn't support XML, (and if you don't support mixed content you don't support XML) you really shouldn't be advertising it until you fix that.

Sunday, August 14, 2005 (Permalink)

Peter Jipsen has released ASCIIMathML 1.4.7, a JavaScript program that converts calculator-style ASCII math notation and some LaTeX formulas to Presentation MathML while a Web page loads. The resulting MathML can be displayed in Mozilla-based browsers and Internet Explorer 6 with MathPlayer. Version 1.4.7 adds now requires only a single line of JavaScript to load.

Friday, August 12, 2005 (Permalink)

Norm Walsh has published DocBook NG: The “PTO” Release; a.k.a. beta 14 of DocBook 5.0. DocBook NG is "a significant redesign that attempts to remain true to the spirit of DocBook." The schema is written in RELAX NG. A DTD generated from the RELAX NG schema is also available. "There are two significant changes in the 'PTO' release. First, the content models associated with sections have been reworked. Second, support for XInclude, first introduced in Mezcal, has been moved into a separate schema. It will be part of the DocBook schema distribution, but it will not be part of the normative DocBook schema."

Michael Smith has released version 1.69.1 of the DocBook XSL stylesheets. These support transforms to HTML, XHTML, and XSL-FO. This is a bug fix release.

Thursday, August 11, 2005 (Permalink)

FourThought has released the Amara XML Toolkit 1.0, an open source "collection of Python tools for XML processing-- not just tools that happen to be written in Python, but tools built from the ground up to use Python idioms and take advantage of the many advantages of Python." Amara includes:

Bindery: data binding tool (a very Pythonic XML API)
Scimitar, an implementation of the Schematron language for that converts Schematron documents to Python scripts
domtools: set of tools to augment Python DOMs
saxtools: set of tools to make SAX easier to use in Python
Flextyper: user-defined datatypes in Python for XML processing

Python 2.3 or later is required.

The W3C RDF Data Access Working Group has published the last call working draft of SPARQL Query Results XML Format.

Wednesday, August 10, 2005 (Permalink)

Netscape has released version 8.0.3.3 of its namesake web browser for Windows. This release fixes various bugs including plugging several security holes. All user should upgrade.

Code Synthesis has released xsd 1.1.0, an open source (GPL) W3C XML Schema language based data binding tool for C++.

Tuesday, August 9, 2005 (Permalink)

Pavel Sher has posted Juxy 0.6.5, "a simple unit testing library for XSLT written in Java. Juxy allows to call or apply individual XSLT templates from Java and does not use any specific features of XSLT processor for that purposes. It relies entirely on TRaX API and should work with any TRaX compliant XSLT processor." A quick glance at the examples makes it look really ugly. The integration with Java seems to be the main problem.

By contrast, Jeni Tennison has released a set of stylesheets that are designed to support unit testing of XSLT stylesheets. In other words this tests XSLT with XSLT. That seems much more natural. On the other hand, it requires XSLT 2.0 which Jaxy doesn't. A batch file is provided that runs the tests with Java. However, any XSLT 2.0 processor should do the trick. According to Tennison, "You embed tests in the stylesheet and can run a script (or configure your editing environment) to extract and run them and create an HTML report on the result." Saxon 8.4 is required to run the .bat file. (No Unix or Mac support yet, though.)

Monday, August 8, 2005 (Permalink)

Michael Kay has released version 8.5 of Saxon, his XSLT 2.0 and XQuery processor. Saxon 8.5 is published in two versions for both of which Java 1.4 or later is required. Saxon 8.5B is an open source product published under the Mozilla Public License 1.0 that "implements the 'basic' conformance level for XSLT 2.0 and XQuery." Saxon 8.5SA is a £250.00 payware version that "allows stylesheets and queries to import an XML Schema, to validate input and output trees against a schema, and to select elements and attributes based on their schema-defined type. Saxon-SA also incorporates a free-standard XML Schema validator. In addition Saxon-SA incorporates some advanced extensions not available in the Saxon-B product. These include a try/catch capability for catching dynamic errors, improved error diagnostics, support for higher-order functions, and additional facilities in XQuery including support for grouping, advanced regular expression analysis, and formatting of dates and numbers." Besides bug fixes, version 8.5 adds Unicode normalization and enables the collection() function to process a directory.

Friday, August 5, 2005 (Permalink)

Day 5 of Extreme begins. This is half day with three sessions and a closing keynote, which is plenty enough. I'm a little zoned by now. First talk this morning is IBM's Erik Hennum discussing "A Unified Type Hierarchy: A Proposal for DITA 2." DITA is the Darwin Information Typing Architecture. From his diagram it looks like another example of the rule that any problem can be solved by an additional layer of indirection. In this case, the indirection allows different topics (answers to questions) to be combined in different ways in different collections. "Maps are collections of topic references that provide a context for topics...maps aggregate topics." This enables the same content to be reused in many different contexts and collections. caveat: despite the terminology this is not about topic maps. DITA is a hierarchy of types. Customizing XML markup to your needs prevents you from sharing the work with other groups. Instead of customizing, he wants to specialize. I'm not sure I see the difference, but it seems to be just sharing some markup and adding new custom markup instead of doing everything from scratch. Extension by substitution. This seems to be like subclassing in OOP. They can say that a steps is an ol, and a step is an li, and a taskbody is a body and so forth. They use DTDs, attributes, and XSLT to enable all this. "It's proven to be very pragmatic." "You can only constrain content models. You can't add things to content models." This is like inheritance where you can override but not add new methods. (Difference is that the overriding elements/methods don't have the same names as the methods they override.) XSLT can easily change your specialized content to the more general form, mostly just by changing element names. This is DITA as it exists today.

What else is needed? Obviously the ability to add properties to subclasses, as well as just rename them. i.e. subtypes can have child elements the supertype doesn't have.

Eric van der Vlist is discussing "RDF Query By example." He's doing a presentation with only angle brackets. Even I don't go this far. I write my notes in XML too, but I add a stylesheet to change them to HTML before displaying them.

He needed to work with LDAP, which is both a graph and a tree. "RDF is a nice way of modelling graphs."

The first (and last) real case study I've seen at this show is Jeff Beck describing "How XML made the NIH "Policy on Enhancing Public Access to Archived Publications Resulting from NIH-Funded Research." The extreme part is the public access policy. (Not so extreme: it's voluntary.) The XML part is PubMed Central, a stable archive of NIH funded research publications. (This is not the same as PubMed, which only contains abstracts, though about 13 million of them. This contains full articles.) About 1400 NIH funded papers a week are published. Every submitted article is converted into XML (often from PDF!?) (Many are submitted in SGML.) These papers need to be accessible without plugins and on slow connections (e.g. in the field in Africa). Special characters and math were thus a problem. Unicode is part of the answer but only part. They are scanning back issues. Only 11 manuscripts are available so far. More are coming. Articles are delayed six months after original paper publication before being posted online. They validate with DTDs and XSLT.

Here's a picture of the poster that collected comments, suggestions, and RFEs for the XML randomizer I talked about on Tuesday:

Some good ideas here and some wild ones, and those are not necessarily non-intersecting sets. It occurs to me that this pretty well describes the whole conference.

Traditionally C. Michael Sperberg-McQueen gives the closing keynote, and this year is no exception. The official title is "Getting it in writing: The letter killeth, but the spirit giveth life. Or was it the other way round?"

C. Michael Sperberg-McQuuen closing keynote

By the way, I apologize for the quality of some of the pictures. The lighting in the room we're in this year really seems to disagree with my camera (a Panasonic Lumix FZ5).

Thursday, August 4, 2005 (Permalink)

Good morning. Welcome to day 3 of Extreme. If you're only reading this in RSS, or in one of those browsers/aggregators that can't handle the full panoply of correct RSS <Cough>Planet XMLHack</Cough>, <Cough>Artima</Cough>, you're missing a lot. Come visit the site. And to answer one frequently asked question I'll do a full text feed for this site at such time as someone writes an RSS client that gives a user experience at least equal to a real Web browser. That means at a minimum:

It must be open source.
It must run on Mac OS X.
It must fully aggregate feeds.
It must hide previously read items.
It must fully support Atom 1.0 and XHTML.
It must not clutter the screen with irrelevant detritus.
The page up, page down, home, and end keys must all work correctly.
It must import and export OPML.
It must be at least as AppleScriptable as FireFox (which honestly isn't all that much).
It must not allow any third party to observe what I'm reading.

That's not everything, but those are some of the most common problems I've encountered; and I've tried pretty much everything that's out there. If a product satisfies criterion 1 (open source) and is written in a language and on top of a toolkit I'm comfortable with and gets pretty close to these requirements, I'd even dedicate some time to filling in the last holes. Sage is probably the closest I've come to what I want, but it's not quite there yet; and I really don't have any experience hacking on XUL. I did submit one patch to fix its font size problems, but the remaining missing features exceed my ability to fix in the time I have available.

I don't have any particular reason to prefer HTML to Atom/RSS for this site. There aren't any ads, and I rarely bother to look at my traffic stats; but I'm not going to put any effort into creating full-content feeds until there's a browser that can handle it.

This morning starts with two sessions on overlap, a perennial favorite topic here at Extreme. In fact, a brief comment I made here on Tuesday while listening to the keynotes elicited a rather lengthy explanation from one of the conference co-chairs on the conference Wiki.

This is the topic that justifies calling this conference Extreme Markup instead of Extreme XML. The overlap problem arises when things don't fit neatly into trees. For instance, suppose a quotation spans several paragraphs but begins and ends in the middle of two different paragraphs. How do you handle that? There are a number of approaches, none of them really satisfying. I wrote a little about this in Effective XML, but everything suggested there was really just a hack. One common approach is to use empty-element tags as pseudo-start and end-tags connected by attributes as in this example adapted from Syd Bauman's talk this morning:

And I said to him, Superman, have you not seen,
The embarrassment havoc I'm wreaking?]]>

The bottom line is that overlap is just not something XML is designed to support. Really handling overlap requires a different markup language. One popular choice is Wendell Piez and Jeni Tennison's LMNL (pronounced "liminal") which stands for "Layered Markup and Annotation Language." Jeni isn't here this year, but Paul Caton is and will be talking about LMNL in the second session this morning.

Overlap tends to arise more in analysis of text rather than authoring of text. Overlap is a particular problem for scholars doing textual analysis of the Bible. Shakespeare, etc. I suspect Syd Bauman will be addressing this in the first session this morning, "TEI HORSEing around: Handling overlap using the Trojan Horse method." Overlap does not seem to be such a big deal for original authors of new work since it's normally possible to fit most content into reasonable trees without a great deal of effort, as long as you think about it up front. And even if you don't it still doesn't tend to come up that often. The author has one view of the document that does fit pretty well into a tree most of the time. For textual analysis, however, different scholars are going to want to ask different questions and thus form different trees from the same text. (As Walter Perry noted yesterday, the intent of the author is not necessarily and should not necessarily be respected by the reader.) For instance, one use case might require marking up text by speaker and another by verse. Do we need to have both trees in the same document, or is the real solution a way of combining and applying different trees over the same text?

However, there's one other use case for overlapping markup; and this is a real practical killer application, even for those of us who don't analyze classical Greek and Hebrew corpora for a living. That use case is change control and revision markers. The current state of the art for change control is CVS and Subversion, which just treat XML like any other text files. I'm not sure what the state of the art is for XML change markers; probably whatever's in OpenOffice.org's format or Microsoft's WordprocessingML. I'll have to take a look at that and see how they handle it. But whatever they do, I bet it's a kludge. I don't think there's a good solution for this problem within the bounds of standard XML.

Why does an ID have to identify a single element? For instance if a quote is spread out over several paragraphs, why can't we assign the same ID to each of the paragraphs or other elements used in the quote, and thus have an ID for the whole distributed quote?

Syd Bauman of Brown University's Women Writers Project kicks off day 3 with "TEI HORSEing around: Handling overlap using the Trojan Horse method." HORSE is the hierarchy obfuscating really spiffy encoding. He's proposing YAMFORX (Yet Another Method for Overlap Representation in XML.)

Syd Bauman at Extreme Markup Languages 2005

He's using RELAX NG to validate. He's using Schematron to validate the matching between the pseudo-start empty element-tags and the corresponding pseudo-end empty-element tag because RELAX NG can't do this. (This would make a compelling use case for extensible RELAX NG complex types validation. Someone remind me to send this to the relaxng mailing list.) He might like to extend this scheme to many elements.

It strikes me that the real problem with the TEI approach is that it's not naturally exposed in the tools and data models used by DOM, XPath, XQuery, XSLT, etc. He recognizes this too. The pseudo-start-tags and pseudo-end-tags don't become real elements when parsed. He can swap which elements overlap which. However, this may break validity.

The second session features Paul Caton talking about "LMNL Matters?" LMNL has not seen broad adoption or development.

He wants genuine third party markup like highlighting a book. That is, the author does not know about, permit, or participate in the markup. Hence Limner.

He references Foucault as saying users tend to think in hierarchies naturally, at least post-Linnaeus.

In Q&A Wendell Piez suggests that multiple overlapping hierarchies are a subset of the overlap problem.

After the coffee break, Erich Schubert discusses "Structure-preserving difference search for XML documents" (co-authors Sebastian Schaffert and François Bry). Think diff, merge, patch, CVS, Subversion, etc. but also about humans scanning diff files to identify changes. This is where diff really falls down. Plus line-based text diffing isn't really suitable for XML. Most XML differencing uses longest common subsequence where XML tokens such a stags are used as the boundaries rather than line breaks. This algorithm focuses on producing the minimal file size, but honestly who cares about this in 2005? It was important for distributing software over Usenet in 1985, but it's just not a problem today. It's more important to focus on human factors and human comprehensibility of the diff files. He shows Logilab's XPath based xmldiff format; and he's right. It's ugly and incomprehensible.

Xcerpt query by example and Simulation Unification algorithm inspired him. He's trying to maximize structure preservation in the diffs. They use Dijkstra's algorithm with optimized cost functions Implementation is open source in C++ on top of libxml. The prototype is limited to XML, but the algorithms generalize to any graphs. The output format is XUpdate

We need a lightning talk session at this conference.

Steven DeRose gives the talk I'm most looking forward to hearing this morning, "Architecture and speed of common XML operations." Optimization's always fun, if usually unnecessary; and this might give me some good ideas for speeding up XOM. (The next beta should be roughly twice as fast as the previous release, by the way. Wolfgang Hoschek found a number of inefficiencies in parsing and serialization I've now corrected.)

He starts off with a recap of basic algorithm analysis. Steve, we all know all this already. Tell us about XML! (There are ~60! particles in the universe. I didn't know that, and I sort of wanted to. Does that include photons and neutrinos or just protons, neutrons, and electrons?)

OK, he's finally started on XML 20 minutes into a 45 minute talk. Or really SGML. A naive implementation of the & operator can have exponential behavior (but better algorithms are possible.) So far this is still old news and a solved problem though.

Onto XPath: what's the speed of different query operations? Same question for DOM. These are non-trivial. XPath tends to be at least O(n) where n is the number of items in the list you're operating on. DOM does not offer preceding or following axis like XPath does. This means XPath on top of DOM has a bottleneck on the preceding and following axes. (Perhaps for this reason nobody actually does this.) If you have a list of all the nodes in the document (which DOM normally doesn't), then preceding and following are fast.

Parameters (the N in O(N) ) can be the number of nodes in the document, the depth of the tree, or the number of children. In his experiments non-CALS documents tended to be about 8 deep. CALS documents were about 13 deep. Number of children runs about 1-100. Dictionaries that out all the definitions at one level may go up to 25,000 child elements or even 125,000 for an English danish dictionary one audience member worked on; but normally number of children is 100 or less.

Storage models:

Raw XML
Native XML database (that encompasses a lot of options)
Relational XML
Why is he ignoring object model issues? That's an important fourth option, though there are as many or more options as in case 2.

Indexing helps for raw XML, but it imposes size limits on the documents you can handle.

Big problem with relational databases for XML is limited axis support. Plus they explode space usage. First child + next sibling is a common compromise. However it's one way. Parent and first-child also works OK. It's O(N) to get the preceding siblings. All preceding siblings is O(N²). All preceding siblings of all nodes is O(N³) on this implementation. Or you can work around it in your code. Instead gather a list of all the siblings in forward order and then run through it backwards. In other words, flipping the order of iteration through the loop may speed code up by three orders of magnitude or more. Collecting all the ancestors is also expensive on top of a relational database. (fixed point join) Dongwook Shin has a method to pack an XML structure into a relational database. This is pretty clever stuff that I can't type fast enough to summarize. Read the paper. However the insertions are expensive but you can use real numbers instead of integers to speed up insertions. And he's out of time.

Sometimes the most interesting things happen at lunch. David Dubin just clued me in to the non-existence of the phantom paper, an influential paper that is regularly cited in its field that was never written and never published. I've heard of urban myths and famous quotes that were never actually said; but this is the first time I've encountered a mythical paper. This paper is usually cited as having been written 30 years ago, so why are we just now discovering that it doesn't exist? Doesn't anyone check their references any more?

Mirco Hilbert and Andreas Witt (co-author Oliver Schoenfeld) are discussing "Making CONCUR Work". As Deborah LaPeyre said in her introduction, CONCUR is the feature in SGML nobody ever implemented.

The issue is non-hierarchies: multiple roots. Non-hierarchical markup (e.g. overlap) can often be implemented as multiple trees. CONCUR allowed on document to reference two different DTDs for two different, potentially overlapping hierarchies. They call their solution MuLaX (Multi-Layered XML). It's modeled on CONCUR but is not CONCUR. It looks like this:




<(1)div type="dialog" org="uniform">
  <(2)text>
    <(1)u who="Peter">
      <(2)s>Hey Paul!
      <(2)s>Would you give me
    
    <(1)u who="Paul">
      the hammer?
    
  
]]>

They can project this onto an annotation layer that I think is real XML. The editor tool is written in C++ with wxWidgets.

The next session is "A New Paradigm of Parsing XML based on Free Cursor Mobility" (FCM) by Antonio J. Sierra. This is event based parsing (like SAX or StAX?). The difference is you can move the cursor back as well as forward. It's designed for small platforms (e.g. cell phones and PDAs). Could you implement XPath on top of this? Maybe even if not perfectly efficiently? Might be important for small devices with large documents.

The API terminology is a little idiosyncratic, brother and father instead of sibling and parent.

The idea is pretty obvious: basically it's an iterator like in a pull parser with extra methods to go backwards as well as forwards. The key is in the implementation. How will this work on streaming data over the network? It measures faster than Xparser-J (DOM) but slower than kxml2. kxml2 is the only parser in his comparison list I've heard of, and it's not state of the art.

Felix Sasaki (co-authors Christian Lieske and Andreas Witt) is talking about the W3C's Internationalization and Localization Markup Requirements (which is scheduled to be published tomorrow but he gave away the URL today). More specifically he's talking about "Schema Languages & Internationalization Issues: A survey."

Felix Sasaki talks about internationalization and localization

The question is how to work internationalization and localization based markup such as bidirectional markers and Ruby into schemas. Versioning is a problem. XHTML is the poster child here. DTD modularization doesn't survive validation. W3C Schemas have problems. Architectural forms may help as may RDF. Eric van der Vlist in Q&A suggests using processing instructions instead.

In the final regular session of the day Liam Quin is catching flak for the W3C about anything and everything people want to bitch about. He hopes XSLT 2 will go to candidate recommendation in a couple of months.

Tommie Usdin is very disappointed in the "nearly all" compatibility of XSLT 2 with XSL 1.1. She wants full compatibility. "It erodes confidence. It's an enormously big public relations mistake."

Syd Bauman wants the W3C to handle characters outside of Unicode (a very bad idea IMHO). He wants SDATA entities.

Simon St. Laurent wants to start simplifying standards again (as XML simplified SGML, XSL simplified DSSSL, and XLink simplified HyTime) and stop building huge specs by committee like Schemas, XQuery, and XSLT 2. Ann Wrightson agrees, and adds that she wants the W3C to knock vendor heads together to encourage more schema compatibility.

John Cowan wants the XML Core Working Group shut down to prevent people from tampering with the core of XML.

Scott from Boeing found that most of their suppliers didn't even know what XML was, and they had to train them. They also had problems with XML editors that couldn't handle some basic tests cases.

Wednesday, August 3, 2005 (Permalink)

Day two of Extreme commences. Simon St. Laurent and Roger Sperberg are also reporting from the show, and both paid more attention yesterday to what people were actually saying than I was. I confess yesterday's sessions on OWL, RDF, Topic Maps, and UBL pretty much put me to sleep. Plus, these days I'm a morning person. I'm ready to go by 6:00 A.M. and anything after lunch is a stretch. (Welcome to middle age.) It also didn't help that two of the talks I particularly wanted to hear yesterday were cancelled. At least the DFDL replacement session was interesting. I was a little too tired to get full value out of it, but it does sound worth exploring more in the future. However, there are lots of good talks to look forward to today starting with two on XSLT and a talk from Walter Perry, one of the most iconoclastic thinkers in the XML space. He's so diametrically opposed to the conventional wisdom that most people can't even hear what he's saying. It's like trying to explain atheism to an eighth grade class in a Texas Christian school. Atheist: "I don't believe in God." Class: "You worship Satan?!" Atheist: "No, I don't believe in any gods." Class: "But that means you believe in the devil." Atheist: "No, I don't believe in the devil either." Class: "But you just said you don't believe in God." Except in Walter's case it's schemas, DTDs, and preexisting agreements he doesn't believe in instead of God and the Devil. (Disclaimer: This is just a metaphor. I have no idea what Walter's religious beliefs are.)

Ken Holman is talking about synthesizing XSLT based on his experience with the UBL stylesheets. 25 stylesheets is too many to write by hand. Instead he annotates a literal result (a hand-authored instance of XSL-FO) and generates the XSLT from that. This was important because in UBL he needs to match the printed formatting very precisely. He annotates with namespaced attributes that an XSL-FO processor will ignore. RELAX NG helped him because it could validate only the annotations and ignore everything else. This seems like a very powerful idea. I don't fully understand his syntax yet, but it doesn't look too complex.

I think I've figured out what was wrong with the camera. The autofocus only works indoors if the flash is turned on. Here's a picture of Ken presenting:

Ken Holman talking about ResultXSLT at Extreme 2005

If only I'd remembered to install iPhoto or Photoshop Elements on the PowerBook before leaving New York.

Matthijs Breebaartis from Holland is talking about "Processing references to documents you don’t have access to: Constructing identifiers with Relax NG and XSLT". The problem is a lot of information is organized into "vendor silos" and they want to be able to break the silos, and show the users what they need from across many different silos without redundancy. They need to link all this stuff but they don't control it. Different publishers have different URLs for the same things. (So much for the "Uniform" part.) They tried to get everyone in one room and agree on basic concepts. He prefers meaningful identifiers to opaque IDs. They wrote everything in RELAX NG and used Trang to translate to W3C XML Schemas to satisfy company policies. Their element names are all in Dutch, but appear to be restricted to ASCII. They use Python.

Matthijs Breebaartis talks at Extreme 2005 (also seen, Eric van der Vlist, G. Ken Holman, and Elliotte's PowerBook)

Why don't the raw XML forms of the papers published on the Extreme web site have xml-stylesheet processing instructions? e.g. this one.

Ann Wrightson is talking about "Semantics of well-formed XML as a human and machine readable language."

Ann M Wrightson talking at Extreme Markup Languages 2005

The official title of Walter Perry's talk is "Indexing The Whole As Well As The Parts: Derived Schemas and Imputed Hierarchies in Document Management." He starts with a quote from Peter Murray-Rust about how the CML DTD must be flexible because we don't understand chemistry:

With CML (unlikely though it may seem) we have to have an extremely fluid DTD. That is because we don't understand chemistry. It was put well by Democritos "Nothing exists except atoms and empty space - all else is opinion". The Chemical Bond is simply an opinion and people fight about it just as much as over XML matters. So CML is increasingly becoming very sparse (atoms, bond and electrons, with a bit of geometry). That allows authors free expression.

Walter thinks we don't really understand documents, schematization, or much else; and hence need more flexibility. Schemata are effectively structural. they are interdocument in scope. They constrain lexical possibility. By contrast instance documents are hyperstructural. Schemas operate on the internal context of a document. external context includes hyperlinks, key/value indexing, processes that can consume a document, and processes that might produce a document. External contexts are significant for document search and query.

It is the process that should decide what kind of documents it can consume; not the document that decides what it can be consumed by; i.e. the document cannot specify its own type.

Indexing documents in semantic value spaces identifies bonds.

Walter Perry at Extreme Markup Languages 2005

The afternoon sessions begin with several talks about XQuery. The first is Daniela Florescu from Oracle with "Declarative XML processing with XQuery: reevaluating the big picture." She thinks we need more architectural work and less syntax sniping. She thinks we don't have a clear idea of the "final goal" of XML, and that we need one. "There is one problem: everybody likes it for different reasons." I disagree on that last point. The beauty of XML is that it solves so many goals so well, including goals no one has thought of. We don't all need to do the same thing.

She's got some interesting things to say; but she's going way too fast for me to get it all down. Her database colleagues don't believe in mixed content. (No surprises there.) Entity relationships (E/R) don't work with mixed content. "XML is the only tractable abstract information model that is not E/R based." She brings up LISP. "30-year malaise in IT infrastructure" as a result of schema dependence. XML is the first to allow instance documents to be created in advance of schema. Schemas differ from community to community. Agreeing on a schema is the most expensive step.

Daniela Florescu talks about XQuery at Extreme

The power of //*; i.e. the ability to query something without knowing where it is or what it's called. (cf. SQL). "XQuery is a not a query language. In my opinion it was a very bad name." Difference between XQuery and SQL is that SQL works on a a table and XQuery on a tree. (Good point, nicely stated.)

XML/XQuery doesn't fit anywhere into the current architecture without paying a large price. Architecture needs to change or XML will fail. XQuery data model must be a first class citizen. Must make XML a graph not a tree. (I'm just taking notes here. I disagree with quite a lot though not all of this.) She wants to deprecate document nodes.

An E/R model is cyclic. No standard way to support this in XML. Only hack solutions. No global and standard solutions. We need native references in XML. This would improve integration between XML and RDF. (In Q&A it comes out that we have them in XML. That's what ID and IDREFS are. It's XQuery that's lacking here.)

She wants to deprecate xsi:nil. She wants to embed code behavior into schemas! She wants assertions (preconditions and postconditions) in schemas.

She wants continuous queries over infinite sequences in XQuery. (That might be useful.)

"XSLT is easier when the shape of the data is unknown. XQuery is easier when the shape of the data is known." (Another good point, nicely put.) Web services and XQuery don't work together closely enough. We need to make XQuery a full programming language. It's Turing complete, but inconvenient for programmers. Writing the code in Java kills the advantages of XML. She wants updates, variable assignment, error handling, and deterministic evaluation order added to XQuery.

"The industry needs to outgrow the 'XML is a syntax' myth." She's making so many points and claims so fast that she has little to no time to justify any of them. There's probably a day's worth of material she's trying to cram into 45 minutes. I see virtually no chance of her getting all (or even any) of the changes she wants, and that's probably a good thing.

Next Jonathan Robie (cowritten with Daniela Florescu) describe "XQuery Update Facility: Setting Up the Problem" He's talking about one subbullet of one bullet on one of Daniela's slides in the last talk. "We're still not certain what we need." They aren't even sure about the use cases, much less implementations and strategies. XQuery updates are "mostly ACID." Isolation is the part that makes it only mostly.

IBM's Achille Fokoue is talking about "Extracting input/output dependencies from XSLT 2.0 and XQuery 1.0." This involves mappings between schemas (input and output formats) as defined by XSLT and XQuery.

The accuracy of the mappings is a trade-off with the cost of creating the mappings. It can have exponential behavior if you aren't careful. XSLT is too tricky to handle due to recursion. XQuery is easier.

The final talk of the day is C. Michael Sperberg-McQueen on "Applications of Brzozowski derivatives to XML Schema processing." First a brief lesson in Polish on how to pronounce "Brzozowski."

A Brzozowski derivative is:

the derivative of R with respect to s is the set of strings t which can follow s in sentences of R, or: the set of strings t such that the concatenation of s and t is a sentence in R.

Regular sets of strings can, of course, be denoted by regular expressions, and Brzozowski's contribution was to show how, given (1) a regular expression E denoting the language R and (2) a string s, to calculate a regular expression D denoting the derivative of R with respect to s. He also proved (3) that of all the derivatives of an expression, only a finite number would be distinct from each other in terms of recognizing different languages, and (4) that even if equal expressions are not always detected, there will still be only a finite number of dissimilar derivatives, if certain simple tests of similarity are performed; he then showed (5) how to construct a finite-state automaton from the set of characteristic derivatives thus identified.

This talk isn't quite as fast paced as Florescu's, but it too could really use three hours and a blackboard instead of 45 minutes and slides. It's quite mathematical. The notation he's using is unfamiliar to me. He explains it, but following this is going to be tough.

Brzozowski derivatives allow you to avoid building the finite state automaton when evaluating regular expressions. They can also handle non-deterministic regular expressions very simply. This has important implications for validation. Evaluation of a regular expression reduces to the question of whether its derivative? is nullable.

Empty sequences and empty choices (sets) are legal in the W3C XML schema language. But many think restrictions on xsd:all groups are too onerous. (I agree.)

Tuesday, August 2, 2005 (Permalink)

I'm at the Extreme Markup Languages conference in Montreal this week. As the mood strikes me, I may update this site in real time. However, it will be a little slow going at first as I'm giving one of the first talks this morning, and I just found a bug in the software I'm announcing demoing here, and have just over two hours in which to fix it. Plus I realize I left the most recent version of my notes sitting on my desktop at home, and have to recreate the recent changes. :-) My camera is acting up. Unless I can figure out how to fix it, I may not be posting any photos from this year's show.

Is overlap really a question of multiple trees and diffs between them?

My talk is over now. There were a lot of good suggestions in the talk, and I may be busy for a while trying to implement some of the ideas. What I talked about was a small program that obscures XML by randomizing its content and optionally its name while preserving its structure. This enables documents to be submitted to tool maintainers to reproduce bugs without exposing private information. It's currently in very rough shape, just raw source, not even a zip file. I need to improve that now that it's been officially announced. In the meantime, if you're curious probably the best way to get started is by reading the conference paper. It's nice being the first talk at the conference. Now I can give my full concentration to listening to other people, without being constantly distracted by thinking about what I'm going to say. (Last year I basically ignored a talk I really should have heard on GXParse because it happened to fall right before my own XOM talk.)

You know the saying "It steam engines when it's steam engine time"? Sometimes at these conferences you can hear the whistle of the oncoming steam engine a little early. Of course, sometimes the steam engine derails on the way (Schemas); and sometimes it always seems to be right around the corner (XQuery, RDF, Topic Maps, Semantic Web). But sometimes it really does arrive on schedule (XML, UML, Java, HTML, HTTP, REST, RELAX NG). I think I'm picking up the sound of the next train. I've heard it in at least three different places just today, and I don't think these people are working together yet; but they're all heading toward the same station.

Right now I'm listening to Kristoffer H. Rose talk about the Data Format Description Language DFDL (pronounced "Daffodil"). This is a way of mapping from standard binary formats like JPEG, COBOL copybooks, and C code into XML. What jumps out at me about this is he specifically does not want to convert this to an XML document. He just wants to expose the data through an XML interface. And I'm hearing something similar on a lot of fronts right now, including some of my own work in XOM.

The point is that the sheer cost of converting all the data to Strings (and other objects) is starting to limit parser performance. To some extent, this is nothing new. SAX quite deliberately does not pass String to the characters() method. Instead it passes a char[] array and an index into that array. This allows the parser to keep passing the same array to the method and simply update the index. However, SAX, StAX, and similar APIs still create a lot of strings: for each element and attribute name for example. Tree-models like DOM and XOM are even more profligate with object creation. Good parsers like Xerces reuse the same strings; but if you've ever profiled deeply into an XML application you're likely to see a lot of time spent in String creation regardless. What's really annoying about this is that most of the time you don't need most of those strings. A typical application only uses a small subset of the strings (and other objects) an XML parser creates. The I/O cost of moving all this data around can also be significant.

A lot of developers here seem to be converging on the same solution from different directions. The destination is what one of the posters downstairs calls "in situ parsing". In other words, rather than creating objects representing the content of an XML document, just pass pointers into the actual, real XML. In some cases you wouldn't even need to hold the document in memory. It could remain on disk. This won't work with traditional APIs like SAX and DOM. However, it might be important enough to justify a new API. Many, though not all, use cases could see an order of magnitude speed-up or better from such an approach. Memory usage could improve too. Current tree models typically require at least 3 times the size of the actual document, more often more. Using a model based on indexes into one big array might allow these to reduce their requirements to twice the size of the original document or even less. Finally, this approach would make retrieving the actual original text of the document feasible, so you could finally tell whether a document used & or &. Most programs don't need this ability, but it would be very useful for XML editors and other programs that want to do better round-tripping.

Jon Bosak is doing a last minute fill-in presentation on UBL. Basically this is a plan to convert a lot of paper documents to XML forms by defining a "royalty-free library of standard business documents." I don't buy it. Bosak says, "You trade off all the wonderful ways you were customizing things, and you do without it." Customizations are not a mistake. They are necessary functions of doing business. I don't believe in making all businesses work alike just to standardize a few forms and take more humans out of the loop. (plus it's not RESTful, but Bosak thinks that's fixable.)

Monday, August 1, 2005 (Permalink)

JAPISoft has released EditiX 4.2, a $99 payware XML editor written in Java. Features include XPath location and syntax error detection, context sensitive popups based on DTD, W3C XML Schema Language, and RelaxNG schemas, XSLT and XSL-FO previews, XInclude, XML catalogs, an XSLT debugger, DocBook support, and multi-view preview. Version 4.0 adds XSLT 2.0 and XQuery support and can import Excel comma separated values files. EditiX is available for Mac OS X, Linux, and Windows.

Sunday, July 31, 2005 (Permalink)

Microsoft has posted the first beta of Internet Explorer 7 (available to MSDN subscribers only). I assume it supports the usual batch of standards: XML, XSLT, HTML, XHTML, CSS, etc. in the usual poor way. I haven't installed it yet. Can someone who has please check and see if it finally gets MIME types right or if it's still depending on non-standard types like text/xsl? In particular, load these two pages in IE7 and tell me what you see:

Saturday, July 30, 2005 (Permalink)

The W3C CSS Working Group has published a new working drafts of CSS3 Values and Units. The Values and Units module "This CSS3 module describes the various values and units that CSS properties accept. Also, it describes how values are computed from "specified" (which is what the cascading process yields) through 'computed' and 'used' into 'actual' values."

Friday, July 29, 2005 (Permalink)

The W3C XSL Working Group has published the last call working draft of Extensible Stylesheet Language (XSL) Version 1.1. Despite the more generic name, this actually only covers XSL Formatting Objects, not XSL Transformations. New features in 1.1 include:

Multiple flows
Change marks
Back of the book indexing
Bookmarks
Markers in tables
fo:page-number-citation-last.
fo:page-sequence-wrapper
clear and float inside and outside
prefixes and suffixes for page numbers

Comments are due by September 16.

Altova has released AltovaXML, a closed source, free-beer XML parser/XSLT engine for Windows. AltovaXML supports XSLT 1, XML Schemas, XQuery, XSLT 2, DTDs, and XML.

Thursday, July 28, 2005 (Permalink)

YesLogic has posted the third beta of Prince 5.0, a $295 payware batch formatter for Linux, Windows, and Mac OS X that produces PDF and PostScript from XML documents with CSS stylesheets. New features in 5.0 include Unicode, PDF links, bookmarks and security, footnotes, cross-references and CSS positioning. This beta adds support for the word-spacing, visibility, empty-cells, and clip CSS properties.

Wednesday, July 27, 2005 (Permalink)

The XML Apache Project has released version 2.7.1 of Xerces-J, an open source XML parser for Java that supports XML, DTDs, the W3C Schema language, SAX2, DOM3, and XInclude. This release fixes one bug in XInclude and several schema related bugs.

Tuesday, July 26, 2005 (Permalink)

Monday, July 25, 2005 (Permalink)

This morning I was reminded yet again of why XML is a vastly superior format to plain text. I wasted at least an hour trying to troubleshoot a problem that in the end came down to this. My cvspserver file was

service cvspserver {
        disable = no
        socket_type     = stream
        wait            = no
        user            = root
        server          = /sw/bin/cvs
        server_args     = -f --allow-root=/usr/local/CVS pserver
        groups          = yes
        flags           = REUSE
}

It should have been:

service cvspserver
{
        disable = no
        socket_type     = stream
        wait            = no
        user            = root
        server          = /sw/bin/cvs
        server_args     = -f --allow-root=/usr/local/CVS pserver
        groups          = yes
        flags           = REUSE
}

If the difference eludes you, well it eluded me too. This never would have been an issue in XML. The problem is that config files are not plain text. They have embedded markup. It's just that the markup is composed of line breaks, tabs, spaces, and other invisible characters you can't even see. Furthermore, every different tool uses a slightly different markup format, and more often than not that format isn't even documented. The real contest is not between XML and plain text. If it were the first file would have worked. The contest is between reliable, robust, explicit, well-understood, well-documented XML formats and fragile, idiosyncratic, implicit, undocumented formats. When you realize that's the choice, there really is no choice. In 2005 no new config format should be anything other than XML.

Sunday, July 24, 2005 (Permalink)

SyncroSoft has released version 6.1 of the <Oxygen/> XML editor. Oxygen supports XML, XSL, DTDs, XQuery, SVG, Relax NG, Schematron, and the W3C XML Schema Language. The major new features in 6.1 is an XSLT Profiler. There are also a couple of dozen other assorted minor improvements and bug fixes. Oxygen costs $196 with support. Upgrades from 6.0 are free,

Friday, July 22, 2005 (Permalink)

Benjamin Pasero has released of RSSOwl 1.1.3, an open source RSS reader written in Java and based on the SWT toolkit. Version 1.13 fixes various bugs, shows the number of unread items in each channel next to the name, supports drag and drop, and add the feed protocol. RSSOwl is the best open source RSS client I've seen written in Java. That said, it still doesn't feel right to me. Even ignoring various small bugs and user interface inconsistencies, news just doesn't flow in this client. The three-pane layout that separates the news item titles from each news item doesn't work well for me.

Ranchero has released NetNewsWire 2.0.1, a closed source RSS client for the Mac. It's available in both free-beer lite and $25 payware versions. Version 2.0.1 adds support for Atom 1.0 and fixes a few assorted bugs.

The Omni Group has released OmniWeb 5.1.1, a $29.95 payware web browser for Mac OS X. OmniWeb 5.x is based on the same KHTML engine Safari uses. However, it has much less XML support; in effect none. It seems to believe that all XML documents should be RSS and gets very confused when they aren't. This is one of the most XML-hostile browsers I've seen in years. Give it a pass.

Thursday, July 21, 2005 (Permalink)

The Mozilla Project has released version 1.0.6 of Firefox, the open source web browser that is rapidly gaining on Internet Explorer. Firefox supports HTML, XHTML, CSS, and XSLT. MathML and SVG aren't supported out of the box, but can be added. 1.0.6 fixes a few bugs affecting extensions introduced in 1.0.5 last week.

Wednesday, July 20, 2005 (Permalink)

Nokia and Sun have submitted JSR-279: Service Connection API for Java ME to the Java Community Process (JCP). According to the JSR:

This JSR proposes a general-purpose high-level Service Connection API for JavaTM ME for mobile devices. The API is intended to support writing mobile clients for identity-based Web services, service-oriented architectures (SOA), and other similar network service application models involving service discovery, authentication and identity. Existing Web services APIs tend to focus on support for low-level protocols, such as SOAP and Web Services Security. However, high-value Web services for mobile devices may be quite complex, requiring identity- based discovery and authentication, multiple service providers, and invocation of device-hosted services. These may require extensive protocol exchanges, complex state machines and other logic. To provide portability and interoperability such applications need to be based on frameworks that specify how multiple protocols and services can work together in a standard way. An example of such a standard framework that is currently being deployed is the Liberty Identity Based Web Services Framework (IDWSF), specified by the Liberty Alliance. Other frameworks with similar goals are also being specified and deployed, including for example the not yet standardized WS* specification suite or UPnP. The supported model is general enough that it could also be extended to non-Web services frameworks.

While it is theoretically possible to write such a framework-based application using only low-level Web services protocol APIs, the programming required would be very complex and would require implementing high-level protocols and other logic already well-specified by the framework. It makes much more sense to provide developers with a standard API to wrap framework behavior so that they can concentrate on service and application specific logic only.

The proposed JSR will specify a high-level Service Connection API for JavaTM ME that supports a simple GCF-like model for application interaction with services. The API will also cover the configuration needed to bootstrap interaction with service frameworks.

Reading between the lines, what this says is that SOAP/WSDL/etc. are still too complicated even when they hide the XML behind APIs like JAX-RPC so they need yet another layer separating the average developer from the XML. While I don't deny that SOAP, WSDL, etc. are quite complex, maybe the problem is not the APIs? Maybe the problem is that the underlying specs are too complex and too poorly designed and no API is ever going to be able to hide that fundamental complexity? Maybe the solution should be to stop building systems on top of such baroque, rigid and brittle foundations and instead build on top of simple, flexible systems like HTTP that bend instead of breaking? Just a thought. :-) Comments are due by August 1.

Tuesday, July 19, 2005 (Permalink)

Peter Flynn has published version 4.2 of the XML FAQ. Updates in this version include:

Address for the new RNG mailing list
Updated section on Schemas
New links for the SGML Declaration for XML
Credits for some untagged names
New list of related FAQs
Expanded question on "Why XML?"
Additional information on WYSIAYFWG interfaces for XSLT
New link to email a question/answer to someone

Monday, July 18, 2005 (Permalink)

Michael Smith has released version 1.69 of the DocBook XSL stylesheets. These support transforms to HTML, XHTML, and XSL-FO. Major enhancements in this release include support for DocBook 5.0 (which uses a namespace, unlike previous version of DocBook) and localizations for Albanian, Amharic, Azerbaijani, Hindi, Irish Gaelic, Gujarati, Kannada, Mongolian, Oriya, Punjabi, Tagalog, Tamil, and Welsh. There are also many detailed updates to the formatting.

Oleg Tkachenko has released nxslt 1.6, a Windows command line utility for accessing the .Net XSLT engine. "New features include optionality for source XML or stylesheet, pretty printing, ASCII only escaped output and support for 'omit-xml-declaration' attribute of the exsl:document extension element." nxslt is written in C# and requires the .NET Framework version 1.0 to be installed.

Tkachenko has also posted the first beta of nxslt 2.0. "xslt 2.0 uses new XSLT 1.0 processor in the .NET 2.0 Framework - System.Xml.Xsl.XslCompiledTransform class. Hence it requires .NET 2.0 Beta2 or higher. As a first beta version, nxslt 2.0 Beta1 is quite limited - no support for XInclude, EXSLT, multiple outputs and embedded stylesheets yet."

Sunday, July 17, 2005 (Permalink)

Bruce D'Arcus has published Citeproc, "an XSLT 2.0-based analog to bibtex." Formatting is configured in an XML citation style language (CSL). It currently supports DocBook NG, though instead of the limited DocBook bibliographic support it uses the Library of Congress's Metadata Object Description Schema (MODS).

Saturday, July 16, 2005 (Permalink)

The W3C XML Core Working Group has posted the proposed recommendation of xml:id Version 1.0. This describes an idea that's been kicked around in the community for some time. The basic problem is how to link to elements by IDs when a document doesn't have a DTD or schema. The proposed solution is to predefine an xml:id attribute that would always be recognized as an ID, regardless of the presence or absence of a DTD or schema. This draft loosens up error handling somewhat, but doesn't make any really major changes since the candidate recommendation.

Unfortunately, it's recently been discovered that this scheme is pretty badly incompatible with canonical XML, which likes to inherit attributes in the XML namespace onto descendant elements, thus moving xml:id's from one element to another. This has downstream effects on XML digital signatures and XML encryption.

The working group has basically decided to blame canonical XML for the problem, and not address it themselves. They're half right. Canonical XML should not have assumed that all attributes in the XML namespace would act like xml:lang and xml:space; but it did; and canonical XML is now a four-year old deployed recommendation. xml:id is not. Moving forward with a spec with such an obvious known incompatibility strikes me as less than wise, but I don't get a vote.

Friday, July 15, 2005 (Permalink)

IBM developerWorks has published the latest article in my managing XML Data series, "eXist -- an open source native XML database". This is a quick intro to eXist that demonstrates setting up a sample database and querying it with XQuery using both the GUI tools and the RESTful interface. My overall impression of eXist is that it's got potential, but that it's not ready for prime time yet. In the future, I hope to write some similar articles covering other products like dbXML 2.0, Berkeley dbXML, and Mark/Logic to see how they compare.

Wednesday, July 13, 2005 (Permalink)

The Mozilla Project has released version 1.0.5 of Firefox, the open source web browser that is rapidly gaining on Internet Explorer. Firefox supports HTML, XHTML, CSS, and XSLT. MathML and SVG aren't supported out of the box, but can be added. 1.0.5 is a security update that is recommended for all users. Version 1.7.9 of the integrated Mozilla suite with the same fixes is expected next week.

The Mozilla Project has also posted the second alpha of Deer Park, a.k.a Firefox 1.1. Besides the security fixes in 1.0.5, new features in alpha 2 include

Software update system to streamline product upgrades (currently disabled– will be turned on shortly for testing)
Faster back and forward buttons
Drag-and-drop reordering for browser tabs
Improved popup blocking
Better support for Mac OS X including a Safari profile migrator, Aqua compliance and shell service

Tuesday, July 12, 2005 (Permalink)

Daniel Veillard has released version 2.6.20 of libxml2, the open source XML C library for Gnome. This release fixes assorted bugs, including improved schema support.

Monday, July 11, 2005 (Permalink)

Sun and IBM have released the final version of Java Specification Request (JSR) 105, XML Digital Signature APIs. "The purpose of this JSR is to define a standard Java™ API for generating and validating XML signatures." A reference implementation is included with the Java Web Services Developer Pack 1.6. All that's available in the spec is JavaDoc for the javax.xml.crypto package. The lack of any real examples is disturbing. Designing APIs without attempting to write sample code and tutorials generally leads to APIs that are too hard to use, too hard to understand, have significant gaps in coverage, and spend excessive effort on parts nobody needs. When working on XOM, I don't consider anything complete until there's sample code and documentation, beyond merely the API documentation. The library developer's view of an API is all too often too focused on a class's internals and not focused enough on what the library looks like to a client of the library. Writing tutorials and sample programs forces you to switch perspectives and results in cleaner, saner, simpler APIs.

Saturday, July 9, 2005 (Permalink)

The W3C XML Core Working Group has posted the last call working draft of the XML Linking Language (XLink) Version 1.1. There are three major changes in XLInk 1.1 compared to 1.0:

XLinks now contain IRIs rather than URIs
All attributes in the XLink namespace are now reserved for future versions of XLink.
Most importantly, the xlink:type="simple" attribute is no longer required.

That is a simple link can now be written like this:

<composer xlink:href="http://www.beand.com/">Beth Anderson</composer>

It's no longer necessary to write this:

<composer xlink:type="simple" xlink:href="http://www.beand.com/">Beth Anderson</composer>

This is a good thing. I'm not sure who first came up with this idea, but I've been advocating it for a while now. This makes XLink a lot more palatable in applications like XHTML 2 and SVG.

Friday, July 8, 2005 (Permalink)

Opera Software has posted the first beta of version 8.0.2 of their namesake $39 payware web browser for Windows, Linux, and the Mac. The major new feature in this beta is Bit Torrent support. As usual Opera is marching to its own drummer. Now that they've done it, BitTorrent support in the browser seems almost obvious; but they're first out of the gate. However, they still haven't implemented some basic features everyone else has, most notably XSLT. It feels reminiscent of the old Mac word processor Nisus Writer. Half the time you felt like you were using your grandkids' word processor and the other half the time you felt like you were using your grandfather's. There's something to be said for clearing new territory instead of replowing the same old fields. However, in my experience failing to implement the core features of the market leader dooms one to being a niche player. There were users who swore by (and at) Nisus Writer; but most authors, myself included, just couldn't get past the lack of features we'd grown accustomed to in Word, WordPerfect, and other more conventional products. We liked Nisus's macro language, non-contiguous selection, regular expressions, Unicode support, and other features the market leaders wouldn't add for years to come. But we just could never quite get over the features we missed from Word. For me, outline mode was the killer. Opera feels similar. It's really, really nice; and really stands out from the crowd in so many ways. But damn it, I just can't stomach a browser that won't do XSLT so I stay with Firefox.

Thursday, July 7, 2005 (Permalink)

The Sibelius Group has released Sibelius 4.0, a $599 payware music notation program for Mac OS X. This release adds MusicXML support (among numerous other features of less interest to this audience.)

The W3C Voice Browser Working Group has published the third last call working draft of Voice Browser Call Control: CCXML Version 1.0. According to the spec abstract, " CCXML is designed to provide telephony call control support for dialog systems, such as VoiceXML. While CCXML can be used with any dialog systems capable of handling media, CCXML has been designed to complement and integrate with a VoiceXML interpreter. Because of this there are many references to VoiceXML's capabilities and limitations. There are also details on how VoiceXML and CCXML can be integrated. However, it should be noted that the two languages are separate and are not REQUIRED in an implementation of either language. For example, CCXML could be integrated with a more traditional Interactive Voice Response (IVR) system or a 3GPP Media Resource Function (MRF), and VoiceXML or other dialog systems could be integrated with other call control systems." There've been lots of significant changes since the last draft. Comments are due by July 29.

Wednesday, July 6, 2005 (Permalink)

The W3C Cascading Style Sheets working group has posted the second public working draft of CSS3 Text Effects Module. "This CSS3 module defines properties for text manipulation and specifies their processing model. It covers line breaking, justification and alignment, white space handling, text decoration and text transformation." Properties defined in this spec include:

word-break
hyphenate
text-wrap
word-wrap
text-align
text-align-last
text-justify
word-spacing
letter-spacing
text-kashida-space

Tuesday, July 5, 2005 (Permalink)

The W3C Quality Assurance Working Group has posted the proposed recommendation of the QA Framework: Specification Guidelines. According to the abstract, "Much effort goes into writing a good specification. It takes more than knowledge of the technology to make a specification precise, implementable and testable. It takes planning, organization, and foresight about the technology and how it will be implemented and used. The goal of this document is to help W3C editors write better specifications, by making a specification easier to interpret without ambiguity and clearer as to what is required in order to conform. It focuses on how to define and specify conformance for a specification. Additionally, it addresses how a specification might allow variation among conforming implementations. The document contains a set of guidelines or requirements, supplemented with good practices, examples, and techniques."

The W3C Quality Assurance Working Group has also posted the third public working draft of Variability in Specifications. "This document analyzes how design decisions of a specification's conformance model may affect its implementability and the interoperability of its implementations. To do so, it introduces the concept of variability - how much implementations conforming to a given specification may vary among themselves - and presents a set of well-known dimensions of variability. Its goal is to raise awareness of the potential cost that some benign-looking decisions may have on interoperability and to provide guidance on how to avoid these pitfalls by better understanding the mechanisms induced by variability."

Sunday, July 3, 2005 (Permalink)

The XML Apache Project has released XMLBeans 2.0, one of many XML data binding frameworks for Java. This one is based on the W3C XML Schema Language and also provides access to the full underlying XML Infoset through an XML Cursor API. New features in 2.0 include:

XQuery/XPath integration
DOM Level 2
Support for custom methods to generated XMLBeans.
Error codes
Fail-fast behavior for simple types
Access to the post schema validation infoset during validation
Java 1.5 Generics in generated code if you like.
Tools for generating schemas from instance documents and vice versa

This release also claims to increase performance. However, some of that comes from using the Piccolo XML parser, which concerns me. Piccolo was a promising non-validating parser written in Java using a compiler generator instead of a hand-written parser. However, it has got more than a few bugs, and the developer hasn't made any progress on fixing them for a long time. Perhaps the XMLBeans Project has started fixing those bugs themselves, which would be nice since it would be a shame to see an otherwise promising parser languish. However, if they haven't then that's a big step backward, and virtually guaranteed to introduce bugs into XMLBeans they may not have noticed yet.

Saturday, July 2, 2005 (Permalink)

Nikolaus Gebhardt has released irrXML 1.1, a small XML parser written in C++ with its own unique API. These days I'm extremely skeptical of new parsers, especially ones that advertise speed and size benefits as irrXML does. All too often they achieve that by cutting corners on mandatory aspects of XML like well-formedness checking or the internal DTD subset. Perusing the API documentation I don't see any obvious mistakes. The API appears to be missing support for comments and processing instructions, but that's not illegal as long as these are parsed correctly. More seriously, the API doesn't appear to have any namespace support, so this would tend to rule out irrXML for general purpose XML processing.

Friday, July 1, 2005 (Permalink)

The W3C XKMS Working Group Working Group has published the final recommendations of XML Key Management Specification (XKMS) and XML Key Management Specification (XKMS) Bindings. XKMS is a set of "protocols for distributing and registering public keys, suitable for use in conjunction with the standard for XML Signatures [XML-SIG] defined by the World Wide Web Consortium (W3C) and the Internet Engineering Task Force (IETF) and companion standard for XML encryption [XML-ENC]. The XML Key Management Specification (XKMS) comprises two parts -- the XML Key Information Service Specification (X-KISS) and the XML Key Registration Service Specification (X-KRSS). These protocols do not require any particular underlying public key infrastructure (such as X.509) but are designed to be compatible with such infrastructures."

Thursday, June 30, 2005 (Permalink)

I've posted the second beta release of XOM 1.1, my free-as-in-speech (LGPL) dual streaming/tree-based API for processing XML with Java. Version 1.1 maintains backwards compatibility with XOM 1.0 while adding a number of important new features including XPath queries, document subset canonicalization, exclusive XML canonicalization, external XSLT parameters, and xml:id support. The API is now considered to be reasonably stable, and probably won't change before 1.1 final. Beta 2 is primarily a bug fix release. Numerous bugs have been fixed in XPath. In addition a couple of other random issues have been fixed including namespace handling in SAX conversion and attribute parentage in copied elements. This release probably introduces a lot of bugs in serialization with Unicode Normalization Form C (NFC), though as yet I haven't proved that. Addressing this will be a major focus for beta 3. If anyone has test cases that demonstrate incorrect handling of NFC, I'd appreciate it if you'd send those my way. Beta 2 also make some optimizations that should slightly decrease the memory footprint. XOM requires Java 1.2 or later and is published under the LGPL.

Wednesday, June 29, 2005

I've posted beta 7 of Jaxen 1.1, an open source (modified BSD license) XPath 1.0 engine for Java that is adaptable to many different object models including XOM, JDOM, DOM, and dom4j. Jaxen was originally written by James Strachan and Bob McWhirter. Brian Ewins also did a lot of work for this release. Beta 7 expands the JavaDoc, improves compatibility with both Java 1.4 and 1.5, fixes some thread safety issues, modularizes the test suite, and addresses numerous areas where Jaxen wasn't correctly implementing the XPath specification including:

Operator associativity
Correctly counts characters from beyond the Basic Multilingual Plane
Improved string functions including string-length, translate, and substring
Proper formatting of Inf and NaN

While there's always the possibility (certainty?) of as yet unnoticed bugs, I think all known issues in the core have now been addressed. There are still a couple of issues in the JDOM navigator to deal with. All users should upgrade to this release. Several users have reported problems where the more rigorous checking in the latest betas have uncovered mistakes in their XPath expressions. If you try this release and something that used to work now fails, please check to see if the expression was correct in the first place. The current betas are much stricter and less forgiving than the older versions. In all cases I'm aware of, rewriting the expressions in correct XPath 1.0 syntax was easy and fixed the problems. Don't be fooled by the "beta" designation. This release has many fewer bugs and is much more conformant to the XPath specification than the official 1.0 release. We'll probably get around to calling it 1.1 final sometime later this year after doing more work on testing, documentation, performance, and code cleanup. However, there's no reason to wait for that. If you're using Jaxen, you should upgrade to this beta.

Tuesday, June 28, 2005

Michael Kay has released version 6.5.4 of Saxon, an XSLT 1.0 processor written in Java. Saxon is open source under the Mozilla Public License. Java 1.1 or later is required. This release fixes assorted bugs, but adds no new features.

Saxon's a very fast XSLT processor that's reasonably conformant to the specification (it still misses a few corner cases); but it's a classic example of how not to run an open source project. The source is available, but only just. The source code repository isn't public. Most importantly there are no test cases. They exist but they've never been released. (I hope to address part of the reason for this in my paper at Extreme Markup Languages in August.) Essentially only Michael Kay can update the software, fix bugs, or add features. This release only happened because Ken Holman organized a group of people to pay Kay to make a new release. I had suggested that they pay Kay to release the test suite rather than simply fixing a few bugs, but the group was more interested in a quick fix for a few nagging issues than establishing a solid foundation for long term product development. However, the bugs that still exist are unlikely to ever be fixed.

XOM originally bundled Saxon. However, I switched to Xalan not because it was faster or more conformant, but because there was a reasonable expectation that any bugs I uncovered would be fixed. Ditto, I use Jaxen instead of Saxon for XPath because I can mostly fix the bugs I find in Jaxen myself. I don't have to rely on someone else's availability and interest. Despite this recent update, Saxon still has a lot of bugs that aren't being addressed. If you're still using Saxon 6, then you should upgrade to Saxon 6.5.4; but most users should consider Xalan, libxml, or some other product instead.

Monday, June 27, 2005

Once again, I'll be chairing the XML track for Software Development 2006 West in Santa Clara next March. The Call for Proposals for is now live. Besides XML, tracks include Web Services, Java, Emerging Technologies, C++, Requirements & Analysis, Testing & Quality, .NET, Mobile Development, Security, and People, Process, and Methods. We also have one new track this year, the Business of Software. Most sessions are 90 minute classes, but we also have room for half and full-day tutorials (I prefer half-day tutorials in the XML track), birds-of-a-feather sessions, and panels. Submissions are due by August 6th.

For the XML track, we're interested in practical sessions covering all aspects of XML. This is not specifically an XML show, so we tend to find that our audience responds better to more practical, how-to, basic sessions as opposed to more theoretical, high-level sessions. For instance, a simple introduction to XQuery would go over better than a detailed comparison of XQuery optimization techniques. One thing previous attendees have told us is that they'd like to see more new sessions at each show, so we're going to be looking preferentially for talks that have not previously been given at SD West.

Sunday, June 26, 2005

The XML Apache Project has released XML Security C++ 1.2, a C++ library that implements XML digital signatures. Version 1.2 fixes bugs and adds support for the XML Key Management Specification 2.0 (XKMS).

Saturday, June 25, 2005

The XML Apache Project has released Xerces-J 2.7, a major upgrade to the preeminent open source XML parser for Java.

This release provides a complete implementation of the parser related portions of JAXP 1.3 and also brings Xerces into compliance with SAX 2.0.2, the DOM Level 3 Core and Load/Save W3C Recommendations, the XML Inclusions (XInclude) Version 1.0 W3C Recommendation and the XML Schema 1.0 Structures and Datatypes Second Edition W3C Recommendations.

Xerces-J 2.7.0 incorporates two minor changes to the Xerces Native Interface. A new package org.apache.xerces.xs.datatypes has been added to Xerces' XML Schema API that provides a full schema datatype to object mapping. In addition, in this release we introduced many new parser features, improved parser performance in several areas and fixed many bugs.

Specifically, the significant changes introduced in this release are:

Implemented the following packages defined by JAXP 1.3: javax.xml.datatype, javax.xml.parsers and javax.xml.validation.

Defined and implemented interfaces (org.apache.xerces.xs.datatypes) for accessing actual values.

Implemented a feature that instructs the schema processor to use all schema location hints for a given target namespace when locating components.

Implemented partial experimental support for the first working draft of XML Schema 1.1 Structures and Datatypes.

Implemented features for configuring whether the XInclude processor performs base URI fixup and/or language fixup.

Implemented the XInclude Recommendation (December 2004).

Fixed a bug which caused the DTD validator to be activated when using the XInclude processor with schema validation turned on.

Modified the XNI XMLLocator interface to include getCharacterOffset and getXMLVersion. Added a getCharacterOffset method to XMLParseException.

Made various modifications to support the reporting of character offsets in XNI and DOM.

Implemented SAX 2.0.2 and SAX2 Extensions 1.1.

Fixed SAX conformance bugs including one concerning skipped entities.

Implemented a feature that allows schema annotations to be validated.

Added support for generating synthetic annotations when a schema component has non-schema attributes but no child annotation.

Reimplemented Text.replaceWholeText and TypeInfo.isDerivedFrom according to the DOM Level 3 Core Recommendation.

Created two new parser configurations that support XML 1.1.

Improved the performance of the SymbolTable, processing of attribute values and parsing of relative URIs.

Added support for EntityResolver2 and LSResourceResolver to XMLCatalogResolver.

In addition, many bugs were fixed.

Friday, June 24, 2005

The Mozilla Project has posted the first alpha of Firefox 1.1, a.k.a. Deer Park. "it is being made available for testing purposes only for developers and the testing community. Current users of Mozilla Firefox 1.0.x should not download or use Deer Park Alpha 1. Note: Deer Park Alpha 1 is not an official mozilla.org final release, it has been made available for testing purposes only, with no end-user support. If that sounds scary, you'd probably be better off with the latest final release." New features in 1.1 for XML developers include:

CSS @-moz-document selector for matching on site/document URL, useful in user stylesheets
SVG support is now turned on by default.
XML Events in JavaScript
An extension to support XForms
CSS 2 quotes support
CSS 2 counters support
Lots of neat CSS 3 features
URIs are now always encoded in UTF-8

Other new features in 1.1 include:

Sanitize "provides an easy way to quickly remove browsing history, cookies, cache, saved form information, and other personal data. The items to be removed can be customized, and the feature can be activated using either a keyboard shortcut or through a menu item."
When viewing images, tab icons now display thumbnails of the displayed image.
"Much faster session history navigation. The feature is off by default but can be enabled for testing purposes by setting the browser.sessionhistory.max_viewers preference to a nonzero number."
FTP users are prompted for a name and password if anonymous access fails.
Report a broken website wizard
Changes made in the Preferences window now apply immediately
Searchable download actions manager
Searchable cookie manager

I tried out one of the earlier nightly builds on my Mac. It felt a little slower than 1.0.4 but otherwise quite stable.

Google has released version 0.1 of AJAXSLT, an open source implementation of XSLT in JavaScript, of all things. It's intended to be used in browsers. I'm skeptical. Why not just use the browser's built-in XSLT engine and maybe DOM level 3 XPath? Except for Opera and Lynx, I think pretty much all major browsers have XSLT support these days. Still, according to the README:

Safari/2.0 has XSL-T built in, but it is not exposed to JavaScript, but is only applied to XML documents that have a stylesheet declaration when they are loaded.

Internet Explorer exposes XSLT via the transformNode() method on the XML DOM. However, this is not available if ActiveX is disabled.

Firefox exposes XSLT via the XSLTProcessor() object, however XPath is not exposed in the DOM. TODO(mesch): verify this.

Still, before reinventing the wheel, I'd prefer to work on improving the XSLT/XPath support in at least Firefox and Konqueror, the open source base for Safari. Implementing XPath alone is non-trivial, much less XSLT doing it in browser JavaScript strikes me as insane and unlikely to succeed. Just scanning the source code I can see lots of unimplemented functions, and lots of places they made the same mistakes we made in Jaxen. For instance they've got the same bug in counting characters in strings that I fixed a couple of days ago in Jaxen. XPath and XSLT have a lot of weird corner cases to deal with.

Thursday, June 23, 2005

Planamesa Software has released NeoOffice/J 1.1, a Mac port of the open source OpenOffice suite. NeoOffice is built on top of Java rather than X-Windows. Mac OS X 10.2 or later is required. NeoOffice is published exclusively under the GPL.

This is one of the things I love about open source software. When one team fumbles the ball (as Sun did rather embarrassingly with their Mac port of OpenOffice) someone else can pick it up and run with it. Furthermore, two different teams can run toward the same goal along different paths to see who gets their first. It sounds like an inefficient duplication of resources, and it is; just like a <sarcasm>competitive marketplace is an inefficient duplication of resources compared to a nice, managed economy.</sarcasm> (We all know how well those turn out.) In this case, the official X-Windows-based solution for Mac OS X proved to be an unmitigated disaster. Planamesa's Java-based approach actually worked. Of course, there was no way to know that one approach would work and one would fail until two different groups tried it both ways.

The KDE Project has released KOffice 1.4, an open source office suite (word processor, spreadsheet, presentation program, etc.) for Linux. This release can save and open files in the XML-based OASIS OpenDocument file format that will be used by OpenOffice 2.0 New applications in this release include the Krita image editor and Kexi database.

JAPISoft has released JXP 1.3.7, a €199 payware XPath 1.0 API that can be customized to fit different object models. Out of the box it supports DOM and JAPISoft's own API. This release fixes bugs, improves Unicode support, and speeds up some evaluations.

Bare Bones Software has released version 8.2.2 of BBEdit, my preferred text editor on the Mac. This is a bug fix release. BBEdit is $179 payware. Upgrades from 8.x are free. They're $49 for 7.0 owners and $59 for owners of earlier versions. Mac OS X 10.3.5 or later is required.

Wednesday, June 22, 2005

The Mozilla Project has posted the first alpha of Camino 0.9, a Mac OS X web browser based on the Gecko 1.8 rendering engine and the Quartz GUI toolkit. Version 0.9 adds XML pretty printing and various user interface improvements and speed ups. It supports pretty much all the technologies that Mozilla does: HTML, XHTML, CSS, XML, XSLT, etc. Mac OS X 10.2 or later is required.

Monday, June 20, 2005

The W3C Voice Browser Working Group has published a note on SSML 1.0 say-as attribute values. "The say-as element in SSML 1.0 is considered one of the most useful elements of the language. However, SSML 1.0 does not define the values of the attributes of this element. This Note provides definitions for these attributes that cover many of the most common use cases for the say-as element."

Sunday, June 19, 2005

Andy Clark has posted version 0.9.5 of his CyberNeko Tools HTML Parser for the Xerces Native Interface (NekoXNI) This new version of the HTML parser is mostly a bug fix release. CyberNeko is written in Java. Besides the HTML parser, CyberNeko includes a generic XML pull parser, a DTD parser, a RELAX NG validator, and a DTD to XML converter.

Netscape has released version 8.0.2 of its namesake web browser for Windows. Mac OS X and Linux have been pretty much abandoned. This is a bug fix release.

Saturday, June 18, 2005

Cladonia Ltd.has released the Exchanger XML Editor 3.1, a $130 payware XML Editor written in Java. Features include

Schema Based Editing
Tag Prompting
Validation against DTD, XML Schema, RelaxNG
Tree View and Outliner for Tag Free editing
XPath and Regular expression searches
Schema Conversion
XSLT
Project Management
SVG Viewer and Conversion
Easy SOAP Invocations
Find in Files
Extension Handling
DTD editing
XML catalogs
RelaxNG and DTD based tag completion.
XSLT Debugger
XML Signature support
Better performance with large documents
WSDL Analyzer
WebDAV and FTP support
XInclude resolution
Unordered XML Differencing and Merging,
Content Folding
Split Views
User defined Keyboard Shortcuts
Emacs Keyboard Shortcuts
Multiple Tag-Completion Schemas
Attribute Value Prompting
Navigator with XPath Filters

Version 3.1 adds a grid view and XSLT 2 debugging based on Saxon 8.4.

Friday, June 17, 2005

Opera Software has released version 8.0.1 of their namesake web browser for Windows, Solaris, FreeBSD, Linux, and the Mac. 8.0.1 is mostly a bug fix release, including some security fixes. Opera supports HTML, XML, XHTML, RSS, WML 2.0, SVG Tiny, XHTML+Voice, XmlHttpRequest, and CSS. However, XSLT is still not supported. Opera is $39 payware.

Thursday, June 16, 2005

The W3C Voice Browser working group has published a new note on Authorizing Read Access to XML Content Using the <?access-control?> Processing Instruction 1.0. In brief, this proposes using a processing instruction like <?access-control allow="*.poly.edu *.elharo.com"?> to specify who gets to see a particular document. According to the introduction,

A plethora of applications and data are exposed as XML over HTTP. User agents such as Voice and Web browsers fetch and execute applications but restrict the XML content accessible to those applications merely to the URLs located in the same domain as the application. To take advantage of the rich XML content available on the Web, application developers must resort to proxying the content through the domain hosting their application thereby increasing overhead and limiting scalability.

This note describes a mechanism being used in the industry that allows a content provider to use a processing instruction embedded within the XML content to specify the access policy of that content. In this model a user agent can safely extend the sandbox in which it has restricted the application to include access to the XML content if and only if the specified policy grants permission.

Although this comes out of the Voice group, it's more generally applicable and should probably be taken up by the XML Core group, though I doubt it will be. Processing instructions are decidedly out of fashion at the W3C these days. Personally, I'm not sure whether or not this is a good idea; but I'm always happy to see a new example of processing instructions since that means I can stop using the same tired old <php> and <xml-stylesheet examples in my books and talks. :-)

Wednesday, June 15, 2005

The W3C Voice Browser Working Group has posted the candidate recommendation of VoiceXML 2.1, an XML vocabulary for creating audio dialogs that feature synthesized speech, digitized audio, recognition of spoken and DTMF key input, recording of spoken input, telephony, and mixed initiative conversations. According to the spec, "The popularity of VoiceXML 2.0 [VXML2] spurred the development of numerous voice browser implementations early in the specification process. [VXML2] has been phenomenally successful in enabling the rapid deployment of voice applications that handle millions of phone calls every day. This success has led to the development of additional, innovative features that help developers build even more powerful voice-activated services. While it was too late to incorporate these additional features into [VXML2], the purpose of VoiceXML 2.1 is to formally specify the most common features to ensure their portability between platforms and at the same time maintain complete backwards-compatibility with [VXML2]." New elements in VoiceXML 2.1 include data and foreach. Comments are due by July 11.

Tuesday, June 14, 2005

The W3C CSS Working Group has posted a new last call working draft of Cascading Style Sheets, level 2 revision 1. According to the abstract,

CSS 2.1 builds on CSS2 [CSS2] which builds on CSS1 [CSS1]. It supports media-specific style sheets so that authors may tailor the presentation of their documents to visual browsers, aural devices, printers, braille devices, handheld devices, etc. It also supports content positioning, table layout, features for internationalization and some properties related to user interface.

CSS 2.1 corrects a few errors in CSS2 (the most important being a new definition of the height/width of absolutely positioned elements, more influence for HTML's "style" attribute and a new calculation of the 'clip' property), and adds a few highly requested features which have already been widely implemented. But most of all CSS 2.1 represents a "snapshot" of CSS usage: it consists of all CSS features that are implemented interoperably at the date of publication of the Recommendation.

CSS 2.1 is derived from and is intended to replace CSS2. Some parts of CSS2 are unchanged in CSS 2.1, some parts have been altered, and some parts removed. The removed portions may be used in a future CSS3 specification. Implementations may refer to CSS2 for the definitions of features that have been removed, but for other features CSS 2.1 is the normative reference.

Comments are due by July 15.

Kiyut has released Sketsa 3.2, a $49 payware SVG editor written in Java. Version 3.2 fixes bugs. Java 1.4.1 or later is required.

Monday, June 13, 2005

Benjamin Pasero has released of RSSOwl 1.1.2, an open source RSS reader written in Java and based on the SWT toolkit. RSSOwl is the best open source RSS client I've seen written in Java. That said, it still doesn't feel right to me. Even ignoring various small bugs and user interface inconsistencies, news just doesn't flow in this client. The three-pane layout that separates the news item titles from each news item doesn't work well for me.

Friday, June 10, 2005

Dave Beckett has released the Raptor RDF Parser Toolkit 1.4.7, an open source C library for parsing the RDF/XML, N-Triples. Turtle, and Atom Resource Description Framework formats. It uses expat or libxml2 as the underlying XML parser. Version 1.47 fixes bugs. Raptor is dual licensed under the LGPL and Apache 2.0 licenses.

Antenna House, Inc has released XSL Formatter 3.3 for Linux and Windows. This tool converts XSL-FO files to PDF. New features in 3.3 include SVG output, Text and Property search in GUI, printer marks, PDF Annotations, spot color, and JPEG 200 support. The lite version costs $300 and up on Windows and $900 and up on Linux/Unix, but is limited to 300 pages per document. Prices for the uncrippled version start around $1250 on Windows and $3000 on Linux/Unix.

Dashamir Hoxha has posted version 0.8 of DocBook Wiki, an open source Wiki that can display and edit DocBook documents online. Editing can be done in text, HTML, or XML but the data is always stored in DocBook XML which is automatically converted to other formats when served to browsers. This release switches from CVS to Subversion.

Thursday, June 9, 2005

IBM's developerWorks has published Native XML databases: Theory and Reality, the third in my own ongoing series on managing XML data.

Wednesday, June 8, 2005

Sonic Software has released Stylus Studio 6 XML Enterprise Edition, a $995 payware XML editor for Windows. Features include:

XML differencing
XSLT debugging
XSLT mapping
XSLT profiling
XSL:FO
XQuery editing, mapping, and debugging.
XML Schema Editor
Document Type Definition (DTD) Editor
XPath Evaluator
XPath Expression Generator
Web Service Call Composer
UDDI Registry Browser
Tools for mapping to and from XML documents, Web service data, relational data, and flat files
Import/export utilities for RDBMS, XML, CSV, ADO, and flat files
JSP Editor
XSLT 2.0 Editor and Debugger
Supports the July 2004 XQuery 1.0 working drafts
Convert flat files, binary data, EDI, and other formats to XML
XML Schema Editor
XML grid view for editing tabular XML data

New features in the enterprise edition include XML catalog support, support for the Marklogic XQuery Processor, XQuery and XSLT output validation, an XQuery profiler, an XQuery URI resolver, EDI/EDIFACT/x12 Data Conversion, and a Java code generator.

Tuesday, June 7, 2005

The W3C XSL and XML Query Working Groups have published updated working drafts of XPath Requirements Version 2.0 and XML Query (XQuery) Requirements.

They've also published the last call working draft of XQuery 1.0 and XPath 2.0 Formal Semantics. Comments are due by July 15.

Finally they've published a new working draft of XQuery Update Facility Requirements. XQuery updates are just getting started, and are much less far along than XPath 2/XSLT 2/XQuery 1.0.

I continue to be astonished at just how bad Spotlight is. This is what everyone was so excited about? Not only does it not limit itself to showing me the top hits. When it does show me all 452 hits for a simple search on "XPath Requirements" it doesn't even sort them by relevance. It knows which hit's the most relevant because it picks it as the top hit in the pop up menu. However, I can't open it from there in my text editor. It insists on opening the file in my browser. And when I try to use the standard "type the first few characters of the file name" approach to jump to the file I want in the results window, Spotlight instead starts a new search. And then there's no back button to return to the previous search. Back buttons are what? 10 year old technology? Isn't Apple the company that practically invented Undo? This is such astonishingly bad design. It is a real triumph of technology over user interface. Spotlight may use fancier algorithms than Google or Firefox or BBEdit, but it's impossible to tell because it's so hobbled by bad user interface design.

Nikolai Grigoriev has released SVGMath, an MathML formatter that produces SVG written in pure Python and published under an MIT license. According to Grigoriev, "In its current shape, the tool covers most of the Presentation MathML. It copes reasonably well with the presentation part of the MathML Test Suite, making me hope it might be useful in its current shape."

Monday, June 6, 2005

The W3C RDF Data Access Working Group has also published the second public working draft of SPARQL Protocol for RDF. "The RDF Query Language SPARQL expresses queries over RDF graphs. This document employs WSDL 2.0 to define a protocol for conveying those queries, as well as other operations, to an RDF query processing services and conveying the results of such queries and operations to the entity that requested them. This document also describes an RDF vocabulary for describing the capabilities and characteristics of RDF query processors."

The W3C RDF Data Access Working Group has also published a new public working draft of SPARQL Query Results XML Format.

Antoine Quint has implemented Static SVG Tiny 1.2 using the WHAT Working Group's <canvas> element.

I've been trying to use Apple's Spotlight in Mac OS X Tiger lately. What astonishes me about this product is how as normally user-experience focused a company as Apple could have screwed up this badly. I'd expect this from Microsoft, but Apple?! This is a search engine technology thoroughly mired in the 1980s. They completely missed the lesson of Google. The problem is not to find every occurrence of a phrase anywhere it might be found as quickly as you can possibly find that. This just leaves the user floundering in hundreds or thousands of hits, with no easy way to choose between them. The real problem is to show the user only the hits they want to see, and Spotlight just doesn't do this. When I do a Spotlight search I struggle to find the one file I know is there somewhere amidst many, many irrelevant results. Most of the time I give up and go back to a BBEdit multifile search. This may not be quite as fast as Spotlight is at finding things, but it makes it much faster for me to find the one file I'm actually looking for.

Sunday, June 5, 2005

The W3C XHTML working group has published the seventh public working draft of XHTML 2.0. XHTML 2.0 is the next, backwards incompatible version of HTML that incorporates XFrames, XForms, and lots of other crunchy XML goodness. However, XLink and xml:id are not yet included, XLink may never be. (The HTML Working Group are extreme XLink skeptics.) "This version includes an early implementation of XHTML 2.0 in RELAX NG [RELAXNG], but does not include the implementations in DTD or XML Schema form." (It's interesting that even the W3C working groups are starting to prefer RELAX NG.) It's not immediately clear what's new since the sixth draft.

Friday, June 3, 2005

Recordare has released version 1.1 of MusicXML, an XML application for common Western music notation used in printed sheet music. "The big change is the addition of many new features for music formatting. Files saved in the MusicXML 1.1 format can now include full information about how notes, symbols, measures, staves, systems, credits, and pages appear in a printed score." New elements include defaults, credit, scaling, page-layout, system-layout, staff-layout, measure-layout, barre, harp-pedals, scordatura, tremolo, pluck, and staff-size.

YesLogic has posted the first beta of Prince 5.0, a $295 payware batch formatter for Linux, Windows, and Mac OS X that produces PDF and PostScript from XML documents with CSS stylesheets. New features in 5.0 include Unicode, PDF links, bookmarks and security, footnotes, cross-references and CSS positioning.

Wednesday, June 1, 2005

Syntext has released Serna 2.2. a $199 payware XSL-based WYSIWYG XML Document Editor for Mac OS X, Windows, and Unix. Features include on-the-fly XSL-driven XML rendering and transformation, on-the-fly XML Schema validation, and spell checking. Version 2.2 adds XInclude and redlining.

Tuesday, May 31, 2005

The Apache Web Services Project has posted a beta of Apache XML-RPC 2.0, a java class library for talking to XML-RPC based servers. Version 2.0 allows the use of different HTTP implementations, including a lightweight version of the Jakarta Commons HTTP Client. Disturbingly, this bundles a non-conformant XML parser. Yeah, like I'm really going to leave the XML plumbing and protocol details to the libraries when I can't even trust them to get basic things like XML parsing right.

The W3C Synchronized Multimedia Working Group has posted the candidate recommendation of the Synchronized Multimedia Integration Language (SMIL 2.1). SMIL 2.1 has four goals:

Define an XML-based language that allows authors to write interactive multimedia presentations. Using SMIL, an author can describe the temporal behaviour of a multimedia presentation, associate hyperlinks with media objects and describe the layout of the presentation on a screen.
Allow reusing of SMIL syntax and semantics in other XML-based languages, in particular those who need to represent timing and synchronization. For example, SMIL components are used for integrating timing into XHTML and into SVG.
Extend the functionalities contained in the SMIL 2.0 into new or revised SMIL 2.1 modules.
Define new SMIL 2.1 Mobile Profiles incorporating features useful within the mobile industry.

Sunday, May 29, 2005

Apparently the Firefox Location bar problem that's been bedeviling me is a known bug in Firefox and will be fixed in 1.1. Hence the fix is to upgrade to a Firefox 1.1 nightly build, where indeed everything (or at least this one issue) seems to just work. Credit (and a free book) to Simon Montagu for finding this.

John Cowan has posted the third release candidate of TagSoup, an open source, Java-language, SAX parser for nasty, ugly HTML. I use TagSoup to convert JavaDoc to well-formed XHTML. RC3 improves handling of embedded CSS and JavaScript, and move from java properties to command line options. TagSoup is dual licensed under the Academic Free License and the GPL.

Saturday, May 28, 2005

A lot of people have sent in various suggestions about making Firefox do google searches, but none of them meet the stated requirements: simply typing in a few keywords in the URL bar and pressing return, same as I would if I were typing in a real URL. They all involve an extra step of one kind or another. I'm beginning to think this is going to require hacking Firefox.

I did wonder how Mozilla did it so I went back and checked. Apparently I wasn't remembering it quite right. The difference between Mozilla and Firefox here is that when I type keywords into the Location bar, Mozilla tries to match that with a previously seen URL and provides the list of potential matches in a nice little popup menu like this one:

Firefox does the same thing except that it skips the last entry. That is, it never offers me a Google search as an option. I'll expand the question a little bit. Does anyone know how, if not exactly to solve the original problem, at least to make Firefox's location bar behave like Mozilla's and give me the option for a Google search in the URL autocomplete?

Dashamir Hoxha has posted version 0.7.2 of DocBook Wiki, an open source Wiki that can display and edit DocBook documents online. Editing can be done in text, HTML, or XML but the data is always stored in DocBook XML which is automatically converted to other formats when served to browsers.

Friday, May 27, 2005

Here's a problem that's been bugging me for a few months. In Firefox I like to be able to type in a phrase or a keyword in the URL bar (not the search bar) and have it do a Google search. That works about 80% of the time, except when the search term includes a period. In that case, no matter what, Firefox interprets the search as a host name and responds with a 404. For instance, I can search for "Java 5" but not "Java 1.5". I could swear I had this working in Mozilla, but I've never been able to get Firefox to act like this. Does anyone know the last magic config option to make Firefox allow searches on the URL bar when the search terms contain periods? I'll offer a free copy of XML in a Nutshell (or another one of my books of your choosing) to the first reader who responds with a solution that fixes this problem.

Thursday, May 26, 2005

Wolfgang Hoschek has released NUX 1.2, an open source add-on package for XOM that connects it to Michael Kay's Saxon 8 XSLT 2/XPath 2/XQuery processor and the Sun Multi-Schema Validator. It also provides thread-safe factories and pools for creating XOM Builder objects. NUX also includes yet another non-XML binary format. Version 1.2 adds full text search based on Apache Lucene. NUX is published under a modified BSD license (no advertising clause).

Sleepycat Software has released Berkeley DB XML 2.1.8, an open source "application-specific, embedded data manager for native XML data" based on Berkeley DB. It supports the July working drafts of XQuery 1.0 and XPath 2.0. It includes C++, Java, Perl, Python, TCL and PHP APIs. 2.1.8 is a bug fix release, especially focusing on compatibility with GCC 4.0.

Wednesday, May 25, 2005

Altsoft N.V. has released Xml2PDF 2.3, a $49 payware Windows program for converting XSL-FO and XHTML documents into PDF files. New features in 2.3 include:

WordML input
support for fo:float
side floating in XHTML;
images can be used as a pattern in SVG;

Kiyut has posted the second beta of Sketsa 3.2, a $49 payware SVG editor written in Java. Version 3.2 adds a direct selection tool and fixes bugs. Java 1.4.1 or later is required.

Tuesday, May 24, 2005

Version 1.1 of Chiba, an open source, web-based implementation of XForms based on servlets and XSLT, has been released. Chiba enables XForms to be used in current browsers without plugins or special requirements on the client-side. This release fixes bugs and rearranges the packaging. Chiba is published under the artistic license.

Monday, May 23, 2005

Dave Beckett has released version 1.4.6 of the Raptor RDF Parser Toolkit, an open source C library for parsing the RDF/XML, N-Triples. Turtle, RSS, and Atom Resource Description Framework formats. It uses expat or libxml2 as the underlying XML parser. Version 1.46 adds support for reading XHTML and XML as RDF triples using a GRDDL (Gleaning Resource Descriptions from Dialects of Languages) parser. Raptor is dual licensed under the LGPL and Apache 2.0 licenses.

Sunday, May 22, 2005

The xframe project has posted beta 8 of xsddoc, an open source documentation generator for W3C XML Schemas based on XSLT. xsddoc generates JavaDoc-like documentation of schemas. Java 1.3 or later is required.

Saturday, May 21, 2005

Netscape has released version 8.0.1 of its namesake web browser for Windows. This release is based on Firefox 1.0.4 instead of 1.0.3, Apparently someone at AOL finally got enough a clue pounded into their head to realize that shipping a browser with a known security hole just to meet a deadline and a code freeze wasn't so smart. Any users should upgrade, but most people should probably just use Firefox and forget about Netscape.

Ian E. Gorman has released GXParse 1.7, a free (LGPL) Java library that sits on top of a SAX parser and provides semi-random access to the XML document. The documentation isn't very clear, but as near as I can tell, it buffers various constructs like elements until their end is seen, rather than dumping pieces on you immediately like SAX does. This release adds regular expression based pattern matching.

Friday, May 20, 2005

SyncroSoft has released version 6.0 of the <Oxygen/> XML editor. Oxygen supports XML, XSL, DTDs, XQuery, SVG, Relax NG, Schematron, and the W3C XML Schema Language. New features in version 6.0 include a visual schema editor, diff and merge, and database import. Oxygen costs $196 with support. Upgrades are $115.

Thursday, May 19, 2005

From the browser that wouldn't die department, Netscape has released version 8.0 of its namesake web browser for Windows. Surprisingly Mac OS X and Linux have been pretty much abandoned. Netscape 8 chooses between the IE and the Gecko rendering engines based on the trustworthiness of a site. Trusted sites are displayed using the Windows built-in IE HTML renderer. Untrusted sites are displayed using Gecko. (Personally I'd prefer to use Gecko exclusively, but then that's why I use Firefox.) Presumably Netscape 8 supports XML, XSLT, XHTML, HTML, and so forth because the underlying engines do. New features include

Spyware protection
Tabbed browsing can open "all of your favorite web sites in one window automatically"
A new "Passcard Manager" for autofilling forms

Wednesday, May 18, 2005

Kiyut has posted the first beta of Sketsa 3.2, a $49 payware SVG editor written in Java. Version 3.2 fixes bugs. Java 1.4.1 or later is required.

Tuesday, May 17, 2005

Malcolm Wallace and Colin Runciman have released version 1.13 of HaXml, a bug fix release of the XML processing library for the Haskell language. According to the web page,

HaXml is a collection of utilities for using Haskell and XML together. Its basic facilities include:

a parser for XML,

a separate error-correcting parser for HTML,

an XML validator,

pretty-printers for XML and HTML.

For processing XML documents, the following components are provided:

Combinators is a combinator library for generic XML document processing, including transformation, editing, and generation.

Haskell2Xml is a replacement class for Haskell's Show/Read classes: it allows you to read and write ordinary Haskell data as XML documents. The DrIFT tool (available from http://repetae.net/~john/computer/haskell/DrIFT/) can automatically derive this class for you.

DtdToHaskell is a tool for translating any valid XML DTD into equivalent Haskell types.

In conjunction with the Xml2Haskell class framework, this allows you to generate, edit, and transform documents as normal typed values in programs, and to read and write them as human-readable XML documents.

Finally, Xtract is a grep-like tool for XML documents, loosely based on the XPath and XQL query languages. It can be used either from the command-line, or within your own code as part of the library.

Monday, May 16, 2005

Benjamin Pasero has released of RSSOwl 1.1.1, an open source RSS reader written in Java and based on the SWT toolkit. RSSOwl is the best open source RSS client I've seen written in Java. That said, it still doesn't feel right to me. Even ignoring various small bugs and user interface inconsistencies, news just doesn't flow in this client. The three-pane layout that separates the news item titles from each news item doesn't work well for me.

Ranchero has released NetNewsWire 2.0, a closed source RSS client for the Mac. It's available in both free-beer lite and $29 payware versions. Version 2.0 removes the weblog editor and adds Atom support, flagged items, searching, news persistence, and an embedded browser. The payware version has a nice combined view that might actually convince me to finally start making heavy use of RSS. Or it would if the page up and page down keys worked in this view, but they don't. :-( I wonder why developers don't seem to understand that the purpose of these readers is to page through large quantities of information from multiple different sites as quickly as possible. Having to individually select each item gets in the way of that.

I guess RSS readers are a new field, but I remain amazed at the little annoyances like this that prevent me from adopting RSS/Atom. If this were open source I'd fix it myself, but it's not. If RSSOwl were written to Swing, I could fix the problems there; but since it's written to SWT, I don't have the competence to do that. Ditto for Sage. I have fixed a few bugs in Sage, but the whole XUL process is too far afield from what I normally do for me to get seriously involved. JNN is written in Java, and uses Swing as the widget toolkit. However, the developers have ignored the patches and bug reports I've sent in; and I'm not sure it's worth forking. If I want to maintain my own client, I'd probably start from scratch and release it under the GPL. The server based HTML clients have decent interfaces, but horrible privacy issues. Oh well, I suppose over time the bugs will get worked out.

Sunday, May 15, 2005

IBM's developerWorks has published Managing XML data: XML catalogs, the third article in my ongoing series on issues that arise in collections of XML documents. This article is a brief introduction to the OASIS XML Catalog specification, a means of redirecting requests for entities and DTDs to other, possibly local or modified locations. XML catalogs are a solution to a lot of common problems that arise when processing XML documents from disparate sources.

Friday, May 13, 2005

The W3C XML Schema Working Group has published a note on Processing XML 1.1 documents with XML Schema 1.0 processors. This note recommends ignoring the clear language of the W3C XML Schema 1.0 specification, specifically in regard to the xsd:QName, xsd:Name, xsd:NMTOKEN, xsd:QName and xsd:string data types. These data types are defined in terms of XML 1.0 rules rather than XML 1.1 rules. To make matters worse, W3C XML Schema 1.0 language really isn't flexible or powerful enough to define extension types that can handle XML 1.1 names. (I suppose it could theoretically be done, but I hate to imagine the regular expression that would do it.)

The W3C has only themselves to blame for this debacle. Multiple people warned them about the problems that led to this snafu multiple times during the development of both the W3C Schema Language and XML 1.1 and were repeatedly shouted down. Now there's a perfect storm where the individual problems in each spec are combining to create even bigger problems. Fortunately there are alternatives. XML 1.0 continues to work just fine for almost all needs, and RELAX NG is a vastly superior schema language for pretty much everything. Everyone I know doing practical work with XML is happily using XML 1.0 without even really being aware of XML 1.1. Many of these people are even more happily using RELAX NG. Not all good things flow from the W3C.

Thursday, May 12, 2005

The Mozilla Project has released version 1.0.4 of Firefox, the open source web browser that is rapidly gaining on Internet Explorer. Firefox supports HTML, XHTML, CSS, and XSLT. MathML and SVG aren't supported out of the box, but can be added. 1.0.4 is a security update that is strongly recommended for all users. They've also released version 1.7.8 of the integrated Mozilla suite with the same fixes.

Tuesday, May 10, 2005

The W3C XML Protocol Working Group has published the final version of Assigning Media Types to Binary Data in XML. This spec attempts to preserve the original MIME media type of Base-64 encoded binary data stuffed in an XML element. The mechanism by which this happens is an xmime:contentType attribute for indicating the media type of XML element content whose type is xs:base64Binary. It also defines an xmime:expectedContentTypes for use in schema annotations to indicate what the contentType attribute may say. It is not expected that this work will advance to formal recommendation status.

Saturday, May 7, 2005

The W3C XKMS Working Group Working Group has posted proposed recommendations of XML Key Management Specification (XKMS) and XML Key Management Specification (XKMS) Bindings. XKMS is a set of "protocols for distributing and registering public keys, suitable for use in conjunction with the standard for XML Signatures [XML-SIG] defined by the World Wide Web Consortium (W3C) and the Internet Engineering Task Force (IETF) and companion standard for XML encryption [XML-ENC]. The XML Key Management Specification (XKMS) comprises two parts -- the XML Key Information Service Specification (X-KISS) and the XML Key Registration Service Specification (X-KRSS). These protocols do not require any particular underlying public key infrastructure (such as X.509) but are designed to be compatible with such infrastructures."

Friday, May 6, 2005

Julian Graham has posted SDOM 0.1.2, a DOM Level 3 implementation for Scheme. This is designed as an extension of Oleg Kiselyov's SXML. According to Kiselyov, "SXML is an abstract syntax tree of an XML document. SXML is also a concrete representation of the XML Infoset in the form of S-expressions." SDOM is free software, published under the GPL.

Thursday, May 5, 2005

The W3C Device Independence Working Group has posted the last call working draft of Content Selection for Device Independence (DISelect) 1.0. According to the abstract, "This document specifies a syntax and processing model for general purpose content selection or filtering. Selection involves conditional processing of various parts of an XML information set according to the results of the evaluation of expressions. Using this mechanism some parts of the information set can be selected for further processing and others can be suppressed. The specification of the parts of the infoset affected and the expressions that govern processing is by means of XML-friendly syntax. This includes elements, attributes and XPath expressions. This document specifies how these components work together to provide general purpose selection."

That sounds unobjectionable, but what the working group is really proposing is XML markup that can be added to a page to indicate which devices certain content is appropriate for. For example, this sel:if element says that the image should only be displayed if the user's device supports color or has a window size wider than 500 pixels.

<div sel:expr="dc:cssmq-width('px') &gt; 500" 
    & dc:cssmq-color() > 0" >
  <object src="picture.png"/>
</div>

This feels more than a little like presentation based markup. This is very much like using JavaScript or server side programs to identify different browsers and send them content tailored specifically to them. This syntax is definitely easier-to-use, and more powerful than the various JavaScript and server-side hacks people use today; but should we be doing this at all? Whatever happened to the vision of sending browsers XML documents with appropriate stylesheets and letting the client decide how to best present it? The thing that bothers me the most about this proposal is that the syntax mixes the presentation information straight into the document, rather than linking to it from a separate hints sheet. In many ways, this document seems to reflect a belief that the W3C has been going down the wrong road for the last eight years in attempting to separate content from presentation.

Wednesday, May 4, 2005

RenderX has released version 4.5 of XEP, its payware XSL Formatting Objects to PDF and PostScript converter. XEP also supports part of Scalable Vector Graphics (SVG) 1.1. New features in 4.5 include PDF output that complies to Section 508 (U.S. government accessibility rules) and repeatable XSL FO table footers. The basic client is $299.95. The developer edition with an API is $999.95. The server version is $3999.95.

Tuesday, May 3, 2005

Sleepycat Software has released Berkeley DB XML 2.1.7, an open source "application-specific, embedded data manager for native XML data" based on Berkeley DB. It supports the July working drafts of XQuery 1.0 and XPath 2.0. It includes C++, Java, Perl, Python, TCL and PHP APIs. 2.1.7 improves "overall product stability and made it much easier to get acquainted with Berkeley DB XML. This version is the first to be delivered as pre-compiled binaries within an automated Windows installer. If you're a Windows developer, you no longer have to compile the source code yourself. We've done that for you. The source is still included with the product as always, but now the binaries are there too." In addition the command line interface has been renamed from 'dbxml_shell' to 'dbxml' and many bugs have been fixed.

Monday, May 2, 2005

IBM's developerWorks has posted Identify XML documents, the second article in my ongoing series on managing XML data. This article addresses the often overlooked question of just how one determines that something is an XML document in the first place.

Sunday, May 1, 2005

The Mozilla Project has posted Camino 0.8.4, a Mac OS X web browser based on the Gecko 1.7 rendering engine and the Quartz GUI toolkit. It supports pretty much all the technologies that Mozilla does: HTML, XHTML, CSS, XML, XSLT, etc. 0.8.4 is now compatible with Mac OS X 10.4 Tiger and fixes several critical security bugs. All users should upgrade. Mac OS X 10.1.5 or later is required.

Saturday, April 30, 2005

The W3C Quality Assurance Working Group has published a new public working draft of the QA Framework: Specification Guidelines. According to the abstract, "Much effort goes into writing a good specification. It takes more than knowledge of the technology to make a specification precise, implementable and testable. It takes planning, organization, and foresight about the technology and how it will be implemented and used. The goal of this document is to help W3C editors write better specifications, by making a specification easier to interpret without ambiguity and clearer as to what is required in order to conform. It focuses on how to define and specify conformance for a specification. Additionally, it addresses how a specification might allow variation among conforming implementations. The document contains a set of guidelines or requirements, supplemented with good practices, examples, and techniques."

The W3C Quality Assurance Working Group has posted the second public working draft of Variability in Specifications. "This document analyzes how design decisions of the conformance model of a specification may affect its implementability and the interoperability of its implementations. To do so, it introduces the concept of variability - how much implementations conforming to a given specification may vary among themselves - and presents a set of well-known dimensions of variability. Its goal is to raise awareness of the potential cost that some benign-looking decisions may have on interoperability and to provide guidance on how to avoid these pitfalls by better understanding the mechanisms induced by variability."

Friday, April 29, 2005

The W3C Semantic Web Best Practices and Deployment Working Group has published the first working draft of XML Schema Datatypes in RDF and OWL. According to the abstract, "The RDF and OWL Recommendations use the simple types from XML Schema. This document discusses three questions left unanswered by these Recommendations: What URIref should be used to refer to a user defined datatype? Which values of which XML Schema simple types are the same? How to use the problematic xsd:duration in RDF and OWL? In addition, we further describe how to integrate OWL DL with user defined datatypes."

Thursday, April 28, 2005

From the "I can't believe they're going to try this again" department, I note with both hope and trepidation that the W3C XML Core Working Group has posted the first public working draft of the XML Linking Language (XLink) Version 1.1. XLink 1.0 has been one of the more obvious failures within the XML community. Adoption has been practically non-existent. On first read-through, however, the only substantive change I noted here is that XLinks contain IRIs rather than URIs. There appear to some editorial improvements as well, but I think the basic syntax remains the same.

Update: David Carlisle found a change I missed. The xlink:type="simple" attribute is no longer required. That is a simple link can now be written like this:

<composer xlink:href="http://www.beand.com/">Beth Anderson</composer>

It's no longer necessary to write this:

<composer xlink:type="simple" xlink:href="http://www.beand.com/">Beth Anderson</composer>

This is a good thing. I'm not sure who first came up with this idea, but I've been advocating it for awhile now. This makes XLink a lot more palatable in applications like XHTML 2 and SVG.

Wednesday, April 27, 2005

The W3C Multimodal Interaction Activity has posted the first public working draft of Multimodal Architecture and Interfaces. Quoting from the draft,

This document describes the architecture of the Multimodal Interaction (MMI) framework and the interfaces between its constituents. The MMI Working Group is aware that multimodal interfaces are an area of active research and that commercial implementations are only beginning to emerge. Therefore we do not view our goal as standardizing a hypothetical existing common practice, but rather providing a platform to facilitate innovation and technical development. Therefore the aim of this design is to provide a general and flexible framework providing interoperability among modality-specific components from different vendors - for example, speech recognition from one vendor and handwriting recognition from another. This framework places very few restrictions on the individual components or on their interactions with each other, but instead focuses on providing a general means for allowing them to communicate with each other, plus basic infrastructure for application control and platform services.

Our framework is motivated by several basic design goals:

Encapsulation. The architecture should make no assumptions about the internal implementation of components, which will be treated as black boxes.

Distribution. The architecture should support both distributed and co-hosted implementations.

Extensibility. The architecture should facilitate the integration of new modality components. For example, given an existing implementation with voice and graphics components, it should be possible to add a new component (for example, a biometric security component) without modifying the existing components.

Recursiveness. The architecture should allow for nesting, so that an instance of the framework consisting of several components can be packaged up to appear as a single component to a higher-level instance of the architecture.

Modularity. The architecture should provide for the separation of data, control, and presentation.

Tuesday, April 26, 2005

Jens Låås has released version 1.5.6 of xmlclitools, a set of four Linux command-line tools for searching, modifying, and formating XML data. The tools are designed to work in conjunction with standard utilities such as grep, sort, and shell scripts. This is a bug fix release. They are published under the LGPL.

Kiyut has released Sketsa 3.1.1, a $49 payware SVG editor written in Java. This is a bug fix release. Java 1.4.1 or later is required.

Monday, April 25, 2005

The W3C Compound Document Formats Working Group has published the second public Working Draft of Compound Document by Reference Use Cases and Requirements. "The Compound Document Formats Working Group is producing recommendations on combining separate component languages (e.g. XML-based languages, elements and attributes from separate vocabularies), like XHTML, SVG, XForms, MathML, and SMIL, with a focus on user interface markups. When combining user interface markups, specific problems have to be resolved that are not addressed by the individual markups specifications, such as the propagation of events across markups, the combination of rendering or the user interaction model with a combined document. The Compound Document Formats working group will address this type of problems. This work is divided in phases and two technical solutions: combining by reference and by inclusion. The group is addressing the semantics of combining markups, which goes beyond the mechanics and syntactical elements used to combine markups. The semantic of combining markup is, to a large extent, specific to any two markups being combined. For example, including SVG markup in an XHTML document can be done in various ways and there is a need to define how the combination is done and what it means, especially with regards to issues mentioned above (such as event propagation, user interactions or rendering)."

This goes a little beyond specs like XLink and XInclude, in seeking to specify not just how an XML infoset is created from compound documents, but how DOM and various behaviors extend and interoperate across different namespaces used or referenced in the same document. High-Level Requirements are listed as:

CDR MUST exploit existing specifications, favoring W3C specifications wherever possible and limit the definition of new markup unless absolutely required for integration purposes
CDR MUST provide the ability for content developers to describe or author rich media content define Rich Multimedia Content
CDR MUST specify a base set of formats, corresponding profiles and versions
Each CDR profile and version MUST specify, which formats can be referenced
CDR MUST specify, for each format, the element used to reference other formats, if any.
CDR MUST specify generic integration techniques
CDR MUST support temporal synchronization of dynamic content coming from multiple references, possibly with multiple references to the same source.
CDR MUST support event mechanisms that cross namespace boundaries
CDR MUST support scriptability
CDR MUST say the allowed nesting level of referencing
CDR MUST explain how scripting interacts between components and the parent document
CDR profiles MUST specify how event propagation works across namespace boundaries.
CDR profiles MUST specify how focus traversal works with referenced documents.
CDR profiles MUST specify how link activation work with referenced document.
CDR profiles MUST specify triggering of animations across namespaces.
CDR MUST support fragment identifiers in cross-namespace interaction
CDR profiles SHOULD provide a method for adding event handlers using declarative markup for the formats it uses
CDR documents MUST cater for accessibility requirements
CDR documents MUST support dynamic updating
CDR must define its integration into the Web Architecture. It must include delivery over HTTP and should also strive to be transport independent
CDR MUST NOT prevent compression of the data
CDR MUST NOT prevent packaging of the data
CDR User Agents MUST provide a default font for use by all components
CDR MUST NOT prevent server-side adaptation
CDR MUST support limited bandwidth networks and limited capability devices
CDR Profiles MUST define clear document conformance criteria
CDR Profiles MUST define clear user agent conformance criteria

Requirements for CDR Profile 1 (Rich Multimedia Content) are given as:

CDR Profile 1 MUST specify a user interaction model
CDR Profile 1 MUST explain how a User Agent is able to identify a CDR Profile 1 document
CDR Profile 1 MUST support 2D scalable vector graphics
CDR Profile 1 MUST support audio
CDR Profile 1 SHOULD support video
CDR Profile 1 MUST support grid, flow, overlapping layouts
CDR Profile 1 MUST support SVG backgrounds
CDR Profile 1 MAY support XHTML backgrounds
CDR Profile 1 MUST support identification of markup and versions in CDF documents
CDR Profile 1 MUST support scalable diagrams that can be animated and can cause link traversal
CDR Profile 1 MUST define how to reference SVGT graphics and resources from an XHTML document
CDR Profile 1 MUST support advertising the specific supported versions of formats and capabilities in headers
CDR Profile 1 MUST support XHTML as a root/host language
The XHTML <object> element MUST be used for referring to other formats from XHTML
CDR Profile 1 MUST define the interaction model for an SVG document referenced by an XHTML document
CDR Profile 1 MUST define for animated SVG icons to act like HTML images (no need for interactivity, links, zoom and pan)
CDR Profile 1 MUST define a way for events to trigger SVG animation
CDR Profile 1 MUST define the process for real-estate negotiation between an XHTML document and a referenced SVG document
CDR Profile 1 MUST define handling of leftover SVG area
CDR Profile 1 MUST define system font support in SVG
CDR Profile 1 SHOULD provide temporal synchronization with dynamic media
CDR Profile 1 MAY provide functionality to stop and start media objects
CDR Profile 1 MUST support a unified rendering and processing model

Sunday, April 24, 2005

I've posted the fifth alpha of XQuisitor, my GUI tool for querying XML documents based on XQuery and Saxon. Alpha 5 now works with Saxon 8 (previous version required Saxon 7). This release has received minimal testing, but I hope to change that in the near future. XQuisitor is published under the GPL with a special exception that allows linking to the MPL'd Saxon.

Saturday, April 23, 2005

The W3C Web Services Addressing Working Group has posted a new working draft of Web Services Addressing - WSDL Binding. According to the abstract, "Web Services Addressing provides transport-neutral mechanisms to address Web services and messages. Web Services Addressing 1.0 - WSDL Binding (this document) defines how the abstract properties defined in Web Services Addressing 1.0 - Core are described using WSDL." Interesting note: "The Web Services Addressing Working Group has decided to use XML Schema, where appropriate, to describe constructs defined in this specification. Note that this restricts use of Web Services Addressing to XML 1.0." Changes in this draft are mostly editorial, and include using IRIs instead of URIs.

Friday, April 22, 2005

Sébastien Cramatte has posted xslt2Xforms 0.7.8, an XSLT stylesheet that enables partial XForms support via XHTML, JavaScript, and CSS. Version 0.79 now works in IE 6, and uses Sarissa for all XML functionality.

Thursday, April 21, 2005

The W3C RDF Data Access Working Group has published the third public working draft of SPARQL Query Language for RDF. According to the introduction,

An RDF graph is a set of triples, each triple consisting of a subject, a predicate and an object, as defined in RDF Concepts and Abstract syntax. These triples can come from a variety of sources. For instance, they may come directly from an RDF document. They may be inferred from other RDF triples. They may be the RDF expression of data stored in other formats, such as XML or relational databases.

SPARQL is a query language for getting information from such RDF graphs. It provides facilities to:

extract information in the form of URIs, blank nodes, plain and typed literals.

extract RDF subgraphs.

construct new RDF graphs based on information in the queried graphs.

As a data access language, it is suitable for both local and remote use. When used across networks, the companion SPARQL Protocol for RDF document describes a remote access protocol.

Here's a simple example SPARQL query adapted from the draft:

PREFIX  dc: <http://purl.org/dc/elements/1.1/>
PREFIX  : <http://example.org/book/>
SELECT  $var
WHERE   { :book1  dc:title  $var }

The $ indicates a variable name. This query stores the title of a book in a variable named var. There are boolean and numeric operators as well.

Norm Walsh has written a RELAX NG schema for XSLT 2.0.

Wednesday, April 20, 2005

I've posted the first beta release of XOM 1.1, my free-as-in-speech (LGPL) dual streaming/tree-based API for processing XML with Java. Version 1.1 maintains backwards compatibility with XOM 1.0 while adding a number of important new features including XPath support. The API is now considered to be reasonably stable, and probably won't change before 1.1 final. The primary addition to beta 1 are workarounds for several more Crimson bugs in the handling of the internal DTD subset. Beta 1 also makes a few small optimizations and fixes a few bugs in XPath. Beta 1 requires Java 1.4 or later due to some changes in the underlying Jaxen XPath engine. This is a temporary situation. Beta 2 will return to the previous requirements of Java 1.2 or later.

Henry S. Thompson and Richard Tobin have released XSV 2.9, a partial W3C XML Schema Validator for Linux and Windows. There's also a web form based interface. This is a bug fix release.

The Apache Web Services Project has posted version 0.4 of JaxMe 2, an open source implementation of the Java API for XML Binding. Quoting from the web page,

JaxMe 2 is an open source implementation of JAXB, the specification for Java/XML binding.

A Java/XML binding compiler takes as input a schema description (in most cases an XML schema but it may be a DTD, a RelaxNG schema, a Java class inspected via reflection or a database schema). The output is a set of Java classes:

A Java bean class compatible with the schema description. (If the schema was obtained via Java reflection, then the original Java bean class.)

An unmarshaller that converts a conforming XML document into the equivalent Java bean.

Vice versa, a marshaller that converts the Java bean back into the original XML document.

In the case of JaxMe, the generated classes may also

Store the Java bean into a database. Preferably an XML database like eXist, Xindice, or Tamino, but it may also be a relational database like MySQL. (If the schema is sufficiently simple. :-)

Query the database for bean instances.

Implement an EJB entity or session bean with the same abilities.

In other words, by simply creating a schema and running the JaxMe binding compiler, you have automatically generated classes that implement the complete workflow of a typical web application:

Version 0.4 adds support for nested groups and the indexed collection type.

Late Night Software has released version 2.8 of its free-beer, expat based XML Tools AppleScript scripting addition. This is a bug fix release.

Kiyut has released Sketsa 3.1, a $49 payware SVG editor written in Java. Version 3.1 adds various small features including text selection through the mouse. Java 1.4.1 or later is required.

Tuesday, April 19, 2005

I'm pleased to announce that I'll be speaking at Software Development Best Practices in Boston in September. This will be my first time at this show since it was changed its name and focus from Software Development East. I'll be presenting four sessions:

Testing GUIs with Abbot and Costello
Testing XML
User Interface Principles in API Design
Effective XML

I'll also be hosting a round table on "XForms, Web Forms, or What? The next generation of rich user interfaces". I should stress that, today's quote notwithstanding, I don't have a strong opinion about what the right answer is (or answers are) to the question of what next generation client technology should we be using. I've spent some time looking at XForms myself, but I haven't really explored XUL, XAML, WebForms 2.0, and other possibilities. I do hope, however, that we'll have some strong opinions on the panel from various camps. Anyone interested in participating as a panelist around that table, should drop me a line. Ideally I'd like to have representatives from the Microsoft, Mozilla, WHAT, and XForms camps.

Opera Software has released version 8.0 of their namesake web browser for Windows, Solaris, FreeBSD, and Linux. A Mac version is still in beta. New XMLish features in 8.0 include SVG Tiny, XHTML+Voice, and XmlHttpRequest. Other major new features in 8.0 include speech-enabled browsing, fit-to-window width, easy retrieval of closed pages and blocked pop-ups. and inline error pages. Opera supports HTML, XML, XHTML, RSS, WML 2.0, and CSS. However, XSLT is still not supported. Opera is $39 payware.

Linspire has posted the fourth pre-release of NVU (pronounced N-view) 1.0, an open source GUI HTML editor for Mac OS X, Linux, and Windows based on Mozilla Composer. The software looks reasonably slick, but I'm still not ready to give up BBEdit for it. The general flow of using the application is pretty rough. Saving is poorly designed, with an unnecessary distinction between publishing and saving, and lots of confusing dialog boxes. I could figure it out if I really wanted to, but my wife couldn't. If there's any way to check the links in a document, I couldn't find it. Mozilla Composer was never a particularly a good HTML editor in the first place, and while NVU cleans up some of the more obvious editing problems it's done little to fix the underlying problems. If I'm giving a GUI editor to someone who can't type HTML, I don't expect them to understand FTP or URL syntax either. Here's my test for a useful GUI editor that NVU flunked massively:

Spot a typo.
Edit the word to fix the typo, just like I would in my word processor.
Select File/Save or Ctrl-S, and have the page saved with the typo corrected.

Note what is not involved here:

Figuring out where on the site to save the page. The page already exists. Put it in the same place.
Giving the name of the file. The page already has a name, even if it's index.html and I edited "http://www.cafeaulait.org/"

I could (barely) accept the editor asking for a user name and a password, provided it only asked once and never asked for it if it didn't need it. But that's it. Otherwise, editing a page should be just as easy as editing a file in a word processor like Word, or even OpenOffice (to lower the bar some). NVU doesn't come close to that standard.

Speaking of OpenOffice, Louis Suarez-Potts writes, "A security vulnerability affecting OpenOffice.org 1.1.4 and earlier, as well as 2.0beta, including the developer builds, was recently detected. It has been fixed and a patch is available for immediate download for all users of OpenOffice.org 1.1.4. Users of earlier releases (1.1.3 and prior) must upgrade. Users of 2.0beta are requested to download the latest beta, OpenOffice.org 1.9.95. It will include the patch and be ready shortly."

Monday, April 18, 2005

Apple has released Mac OS X 10.3.9. I'm very hesitant to install this update (available through System Update) because it seems to be breaking Java left and right. Many Java developers and end users of Java applications such as Limewire are reporting core dumps and kernel panics after installing this update. (Reinstalling Security Update 2005-002 may fix the problem.) Nonetheless, there is one new feature in this update that's of interest to XML developers. 10.3.9 updates the Safari web browser to 1.3 which, among other new features, finally adds support for client-side XSLT. The XSLT engine is the Gnome Project's libxslt. Now if we could just get Opera and Lynx under the XSLT tent, we could finally start publishing XML+XSLT on the Web.

The Mozilla Project has released version 1.0.3 of Firefox, the open source web browser that is rapidly gaining on Internet Explorer. Firefox supports HTML, XHTML, CSS, and XSLT. MathML and SVG aren't supported out of the box, but can be added. 1.0.3 is a security update that is recommended for all users. They've also released version 1.7.7 of the integrated Mozilla suite with the same fixes.

Toni Uusitalo has posted Parsifal 0.9.2, a minimal, non-validating XML parser written in ANSI C. The API is based on SAX2. Version 0.9.2 is a bug fix release. Parsifal is in the public domain.

Sunday, April 17, 2005

The W3C Scalable Vector Graphics Working Group has posted the last call working draft of Scalable Vector Graphics (SVG) Tiny 1.2 and a placeholder document for the next version of Scalable Vector Graphics (SVG) 1.2 Full. According to the SVG Full Draft (which is only about two pages long):

The previous drafts of this specification expressed the SVG 1.2 Full language as extensions to SVG 1.1 Full. SVG 1.2 Tiny was expressed as a profile of this specification and referenced both SVG 1.1 and SVG 1.2 Full. Feedback from implementors and reviewers indicated that this made both specifications unnecessarily difficult to understand. The latest draft of SVG 1.2 Tiny is described as a complete language specification, with no dependencies on other SVG specifications. In future drafts SVG 1.2 Full be will refactored as extensions to SVG 1.2 Tiny, forming a superset. It will not have a dependency on the SVG 1.1 Full specification, only on SVG Tiny 1.2. At this time the refactored SVG 1.2 Full specification is not ready for publication. Therefore this placeholder documentation is being published to inform the SVG community and provide redirection of links for other referencing documentation.

Beyond this major editorial shift, changes in the tiny draft are substantive but detailed. They include

IRIs are used instead of URIs.
Attributes with boolean values have been changed to use enumerations, to allow extensibility.
wallclock has been removed.
The page and pageSet elements have been removed.
Added discard element and playbackOrder attribute.
Removed streamedContents attribute.
The opacity property has been added to the image element.
The textArea and associated elements replace the previous Flowing Text features.
'editable' is now allowed on text elements with children, but the children are flattened when editing occurs
The animation element has been added.
The overlay attribute was added to the video element.
The transformBehaviour attribute was added to the video element.
Run-time synchronisation attributes on audio, video and animation.
The focusNext and focusPrev attributes have been added in preference to navIndex.
8-way navigation is now allowed through the focusNorth, focusNorthEast, etc... attributes.
viewport-fill and viewport-fill-opacity replace background-fill and background-opacity.

Saturday, April 16, 2005

I've posted beta 5 of Jaxen 1.1, an open source (modified BSD license) XPath 1.0 engine for Java that is adaptable to many different object models including XOM, JDOM, DOM, and dom4j. Jaxen was originally written by James Strachan and Bob McWhirter. Don't be fooled by the "beta" designation. I've spent a lot of time over the last three months fixing bugs in Jaxen. This release has many fewer bugs and is much more conformant to the XPath specification than the official 1.0 release. We'll probably get around to calling it 1.1 final sometime later this year after doing more work on testing, documentation, performance, and code cleanup. However, there's no reason to wait for that. If you're using Jaxen, you should upgrade to this beta. Beta 5 fixes some bugs in DOM and XOM navigation, improves the JavaDoc, and can now be built from the source distribution using Maven.

Friday, April 15, 2005

IBM's developerWorks has published Managing XML data: A look ahead. This is the first article in an ongoing series in which I intend to explore issues that arise in collections of XML documents. Up till now a lot of my books and writings have focused on how to process individual XML documents. However, some things that are easy when looking at documents one or two at a time become much more challenging when you have hundreds, thousands, or even millions to deal with. For instance, how do you even find the one document that contains the information you need from amidst such a large collection? There's not a lot in XML and its related specification to help you with this. This first article introduces the series. Future articles will cover catalogs, MIME types, storing XML in databases, and more. If there's anything in particular you'd like to hear about in this series, please drop me a line.

Thursday, April 14, 2005

Opera Software has posted the third beta of version 8.0 of their namesake web browser for Windows. This beta adds native Scalable Vector Graphics (SVG) support for the first time. So far it only supports SVG Tiny. Other major new features in 8.0 include speech-enabled browsing (including support for XHTML+Voice), medium-screen rendering, and inline error pages. Opera supports HTML, XML, XHTML, RSS, WML 2.0, and CSS. However, XSLT is still not supported. Opera is $39 payware.

Wednesday, April 13, 2005

Dennis Sosnoski has posted the first release candidate of JiBX, yet another open source (BSD license) framework for binding XML data to Java objects using your own class structures. It falls into the custom-binding document camp as opposed to the schema driven binding frameworks like JaxMe and JAXB. Quoting from the JiBX web site,

JiBX is a framework for binding XML data to Java objects. It lets you work with data from XML documents using your own class structures. The JiBX framework handles all the details of converting your data to and from XML based on your instructions. JiBX is designed to perform the translation between internal data structures and XML with very high efficiency, but still allows you a high degree of control over the translation process.

How does it manage this? JiBX uses binding definition documents to define the rules for how your Java objects are converted to or from XML (the binding). At some point after you've compiled your source code into class files you execute the first part of the JiBX framework, the binding compiler. This compiler enhances binary class files produced by the Java compiler, adding code to handle converting instances of the classes to or from XML. After running the binding compiler you can continue the normal steps you take in assembling your application (such as building jar files, etc.; as of Beta 3a you can also skip this as a separate step and instead binding classes directly at runtime, though this approach has some drawbacks).

The second part of the JiBX framework is the binding runtime. The enhanced class files generated by the binding compiler use this runtime component both for actually building objects from an XML input document (called unmarshalling, in data binding terms) and for generating an XML output document from objects (called marshalling). The runtime uses a parser implementing the XMLPull API for handling input documents, but is otherwise self-contained.

Tuesday, April 12, 2005

jCatalog Software has released XSLfast 2.0, an €890 graphical editor for XSL Formatting objects documents that supports mail merge and form processing. New features in 2.0 include:

Simplified table handling
Multiple pages and reusable layouts can be integrated in one template
Background images for table rows and cells
Attributes are now shown in the XML structure window
The contents of XML nodes can be displayed as a list

Topologi has released Difference Detective, a $29 payware Windows utility for comparing files and directories, and detecting any differences.

Monday, April 11, 2005

Norm Walsh has published DocBook NG: The “Jägermeister” Release; a.k.a. beta 10 of DocBook 5.0. DocBook NG is "a significant redesign that attempts to remain true to the spirit of DocBook." The schema is written in RELAX NG. A DTD generated from the RELAX NG schema is also available. Jägermeister adds mathphrase and termdef elements.

Saturday, April 9, 2005

The W3C SVG and CSS Working Groups have posted the third public working draft SVG's XML Binding Language (sXBL). sXBL works like literal result element used as stylesheet is XSLT. That is, an sXBL document is an SVG document that can contain content from other namespaces. This SVG document specifies bindings between elements in those namespaces and particular SVG shapes. When an SVG processor renders the complete document, it replaces the content from other namespaces with their SVG bindings. This would be more useful if it were also possible to have external sXBL documents (more like traditional stylesheets) that don't require the source document and the SVG to be together in one place. Perhaps this will come in sXBL 2 some time down the road. Ultimately, this would allow browsers to render XML documents that don't look remotely like text, such as MathML and MusicXML. If you're curious about this, you might be interested in An early look at sXBL I wrote for IBM developerWorks.

Kiyut has posted the second beta of Sketsa 3.1, a $49 payware SVG editor written in Java. Version 3.1 adds various small features including text selection through the mouse. Java 1.4.1 or later is required.

Friday, April 8, 2005

Norm Walsh has updated his RELAX NG schema for XSLT. This schema is capable of validating both XSLT 1.0 and XSLT 2.0 stylesheets.

Thursday, April 7, 2005

This morning I posted beta 4 of Jaxen 1.1, an open source (modified BSD license) XPath 1.0 engine for Java that is adaptable to many different object models including XOM, JDOM, DOM, and dom4j. Jaxen was originally written by James Strachan and Bob McWhirter. Don't be fooled by the "beta" designation. I've spent a lot of time over the last three months fixing bugs in Jaxen. This release has many fewer bugs and is much more conformant to the XPath specification than the official 1.0 release. We'll probably get around to calling it 1.1 final sometime later this year after doing more work on testing, documentation, performance, and code cleanup. However, there's no reason to wait for that. If you're using Jaxen, you should upgrade to this beta.

Wednesday, April 6, 2005

Michael Kay has released version 8.4 of Saxon, his XSLT 2.0 and XQuery processor. This release updates Saxon to cover the latest working drafts and fixes assorted bugs. Saxon 8.4 is published in two versions for both of which Java 1.4 or later is required. Saxon 8.4B is an open source product published under the Mozilla Public License 1.0 that "implements the 'basic' conformance level for XSLT 2.0 and XQuery." Saxon 8.4SA is a £250.00 payware version that "allows stylesheets and queries to import an XML Schema, to validate input and output trees against a schema, and to select elements and attributes based on their schema-defined type. Saxon-SA also incorporates a free-standard XML Schema validator. In addition Saxon-SA incorporates some advanced extensions not available in the Saxon-B product. These include a try/catch capability for catching dynamic errors, improved error diagnostics, support for higher-order functions, and additional facilities in XQuery including support for grouping, advanced regular expression analysis, and formatting of dates and numbers." Upgrades from 8.x are free.

Tuesday, April 5, 2005

The W3C XQuery working group has published one new and eleven updated working drafts. The new working draft is Building a Tokenizer for XPath or XQuery, a note that "describes possible strategies for tokenizing the [XML Path Language (XPath) 2.0] and [XQuery 1.0: An XML Query Language] languages, and is provided as a helpful guide to those who are designing an implementation for these languages, and as background material for the normative EBNF found in the language specifications."

Of the eleven updated working drafts, seven of them are in last call:

Comments on these are due by May 13. The other four not yet in last call are:

On first glance, changes since the February working drafts appear fairly minor and mostly editorial. One of the biggest substantive changes is that error handling is now more consistent across different XSLT 2 processors. Processors are less often allowed the option of whether or not to recover from an error. However at a user (as opposed to implementer) level, very little seems to have changed.

Planamesa Software has posted the release candidate of NeoOffice/J 1.1, a Mac OS X variant of OpenOffice that replaces X-Windows with Java Swing. This release is based on OpenOffice 1.1.4.

Monday, April 4, 2005

The W3C Web Services Addressing Working Group has posted two last call working drafts on the subject of, nor surprisingly, web services addressing. Web Services Addressing - Core defines generic extensions to the Infoset for endpoint references and message addressing properties. Web Services Addressing - SOAP Binding describes how the abstract properties defined in the core spec are implemented in SOAP.

Sunday, April 3, 2005

As well as the guffaw-inducing XML Binary Characterization working draft, the W3C Not XML Working Group has updated three other working drafts:

What scares me about "Not XML" is that the purveyors aren't going to be satisfied with designing simple filters that convert heir format into real XML for integration into the tool chain. As soon as they've defined their format, they're next going to look at "fixing" other specs to more efficiently support "Not XML". I think we need to reject this in advance. If the "Not XML" group is really serious about compatibility (which I doubt) they will agree that they will not propose or make any modifications to SAX, DOM, XPath, XSLT, XQuery, XInclude, the XML Infoset, or any other XML technology. They will not add methods to the XML APIs that receive byte arrays or double instead of char[] arrays and Strings. They will not add new properties to the Infoset. They will not add special functions to XSLT and XPath to process binary data. Of course, if they want to create new API and tools to process their Not XML format, that's fine; as long as they don't use the word "XML" to describe what they're doing. But I'm afraid that hijacking the underlying XML specification is just the first step toward a hostile takeover of the entire XML stack.

Saturday, April 2, 2005

Howard Katz has posted version 0.69 of XQEngine, a free-as-in-speech (GPL) full-text search engine for XML based on XQuery. XML Query Engine is a 300K embeddable component written in Java. According to Katz, "It's not a standalone application and requires a reasonable amount of Java programming skill to use. It has a straightforward programming interface that makes that fairly easy to do. It should work well as a personal productivity tool on a single desktop, as part of a CD-based application, or on a server with low to moderate traffic." Collections to be searched are limited to 2.1 billion documents, each of which can contain up to 8.4 million nodes.

Friday, April 1, 2005

The W3C XML Binary Characterization Working Group has published XML Binary Characterization. Like Captain Renault in Casablanca I'm shocked, shocked to find that the group "recommends that the W3C produce a 'binary XML' recommendation." This was pretty much a foregone conclusion from the get go. The group was formed by people who had already decided that real XML wasn't going to meet their needs. A few of them were even right about that.

According to the draft, "The driving notion behind 'binary XML' is generally that it would provide an equally interoperable format with a different set of properties"; and right there the group's gone off the rails and started wandering into oxymoronism. Even a well-documented, well-supported, well-understood, well-implemented binary format will not be as interoperable as text. Text has important characteristics of intelligibility and redundancy that a binary format will not.

The draft is careful to put "binary XML" in quotes, but it's still wrong. Binary formats may have their uses, but they aren't XML. I propose we stop calling this teratoma XML anything and let it live or die on its own merits. If it can't survive without hijacking the XML brand name, then it deserves to become extinct. But if the working group really just can't live without using the three capital letters X, M, and L, I have an alternate proposal for them. Instead of calling the format "binary XML", let's call it "Not XML". For example,

This document describes the processes and results of the Not XML Characterization Working Group in evaluating the need and feasibility of a "Not XML" recommendation. It includes an analysis of which properties such a format must possess. It recommends that the W3C produce a "Not XML" recommendation and enumerates the minimum requirements which this "Not XML" recommendation must meet.

The working group has determined a number of MUST properties for their eventual Not XML format:

Directly Readable and Writable
Transport Independence
Compactness
Human Language Neutral
Platform Neutrality
Integratable into XML Stack
Royalty Free
Fragmentable
Streamable
Roundtrip Support
Generality
Schema Extensions and Deviations
Format Version Identifier
Content Type Management
Self Contained

I predict they're not going to be able to create a format that satisfies all their musts. I also predict that this failure isn't going to stop them from recommending Not XML anyway. There's a long history of W3C working groups ignoring their requirements when they become inconvenient.

The real problem here is not the decision to invent a new binary format. It's the effort to hijack the interoperable XML standard for use cases it was never intended for, and in so doing break XML for everybody who's already using it successfully. The best case scenario is that this effort produces a spec that flops in the marketplace and is widely ignored. (cf. XML 1.1.) The worst case scenario is a universe of incompatible, opaque binary data and tools that no one can understand. There's no chance this format will succeed. No one format will meet all needs. No one format can. Uber-solutions always fail. Think ADA, EBXML, or the Edsel. The only question is how much damage Not XML will do while failing.

The Mozilla Project has posted Camino 0.8.3, a Mac OS X web browser based on the Gecko 1.7 rendering engine and the Quartz GUI toolkit. It supports pretty much all the technologies that Mozilla does: HTML, XHTML, CSS, XML, XSLT, etc. 0.8.3 is a bug fix release. Mac OS X 10.1.5 or later is required.

Thursday, March 31, 2005

NEW! Dashamir Hoxha has posted version 0.7.1 of DocBook Wiki, an open source Wiki that can display and edit DocBook documents online. Editing can be done in text, HTML, or XML but the data is always stored in DocBook XML which is automatically converted to other formats when served to browsers.

I've been thinking a lot about Wikis lately. One thing that has always turned me off about them is the reliance on weak plain text, when I really want to see (and receive) well-formed markup. I wonder what would happen if I uploaded the original DocBook source for Processing XML with Java into a DocBook Wiki? Almost all of the book is still fairly up to date, but there are a few bits and pieces here and there I'd like to futz with; and it would be nice to cover XOM as well. (XOM was mostly invented out of my experience writing Processing XML with Java). I wonder if the authoring environment here is clean enough to get me (or others) to do that work? I don't have time to do this right now, but I think I will add it to my TODO list.

Wednesday, March 30, 2005

The W3C XML Schema Working Group has published the last call working draft of XML Schema: Component Designators. This spec proposes a scheme for naming and identifying XML Schema components. Such components include:

Simple and complex type definitions
Attribute declarations
Element declarations
Attribute and model group definitions
Identity-constraint definitions
Notation declarations
Annotations
Model groups
Particles
Wildcards
Attribute uses
The master schema component representing the schema as a whole.
Facets

The goal is to be able to name, for example, the literallayout notation in the DocBook schema, as well as every other significant piece of the schema. These names could then be used as fragment identifiers in URI references that point to schemas. The draft gives these examples of the current syntax proposal in both abbreviated and full forms:

schema-URI#xscd(/~Items)
schema-URI#xscd(/~Items/item)
schema-URI#xscd(/~Items/item/~0))
schema-URI#xscd(/~Items/item/productName)
schema-URI#xscd(/~Items/item/quantity)
schema-URI#xscd(/~Items/item/quantity/~0)
schema-URI#xscd(/~Items/item/quantity/~0/facet::maxExclusive)
schema-URI#xscd(/~Items/item/USPrice)
schema-URI#xscd(/comment)
schema-URI#xscd(/~Items/item/shipDate)
schema-URI#xscd(/~Items/item/@partNum)
schema-URI#xscd(/type::Items)
schema-URI#xscd(/type::Items/model::sequence/element::item)
schema-URI#xscd(/type::Items/model::sequence/element::item/type::0)
schema-URI#xscd(/type::Items/model::sequence/element::item/type::0/model::sequence/element::productName)
schema-URI#xscd(/type::Items/model::sequence/element::item/type::0/model::sequence/element::quantity)
schema-URI#xscd(/type::Items/model::sequence/element::item/type::0/model::sequence/element::quantity/type::0)
schema-URI#xscd(/type::Items/model::sequence/element::item/type::0/model::sequence/element::quantity/type::0/facet::maxExclusive)
schema-URI#xscd(/type::Items/model::sequence/element::item/type::0/model::sequence/element::USPrice)
schema-URI#xscd(/element::comment)
schema-URI#xscd(/type::Items/model::sequence/element::item/type::0/model::sequence/element::shipDate)
schema-URI#xscd(/type::Items/model::sequence/element::item/type::0/attribute::partNum)

The W3C Semantic Web Best Practices and Deployment Working Group has posted the first working draft of A Survey of RDF/Topic Maps Interoperability Proposals. According to the abstract, "The Resource Description Framework (RDF) is a model developed by the W3C for representing information about resources in the World Wide Web. Topic Maps is a standard for knowledge integration developed by the ISO. This document contains a survey of existing proposals for integrating RDF and Topic Maps data and is intended to be a starting point for establishing standard guidelines for RDF/Topic Maps interoperability."

Altsoft N.V. has released Xml2PDF 2.2, a $49 payware Windows program for converting XSL-FO and XHTML documents into PDF files. New features in 2.2 include:

Bidirectional text
Soft-hyphen based hyphenation;
PDF and EPS as external graphics formats;
SVG markers
SVG symbols;

Tuesday, March 29, 2005

I've posted the third alpha release of XOM 1.1, my free-as-in-speech (LGPL) dual streaming/tree-based API for processing XML with Java. Version 1.1 maintains backwards compatibility with XOM 1.0 while adding a number of important new features including XPath support. Alpha 3 removes the internal dependence on IBM's ICU, so the entire distribution is smaller and more self-contained. In fact, in Java 1.4 and later only the main XOM JAR archive should be required for almost all uses. (XOMTestCase still requires junit.jar. However, if you're using that class you probably already have JUnit in your classpath.) The new normalization code has not yet been optimized. It is almost certainly slower than the old code. Future releases will speed it up and make it smaller. The effect on anything except NFC should be nominal. In addition, a few random bugs have been fixed, and the XIncluder now tries to use relative URLs in xml:base attributes where possible.

Karl Waclawek has released SAX for .NET 1.5, a port of the Java SAX API for push-parsing XML to C#. version 1.5 combines the core and extension APIs, and includes two parsers, AElfred 1.0, a pure C# parser, and SAXExpat 1.5, a wrapper around expat.

Bare Bones Software has released version 8.1 of BBEdit, my preferred text editor on the Mac. The major new feature is support for the Subversion source code control system. BBEdit is $179 payware. Upgrades from 8.0 are free. They're $49 for 7.0 owners and $59 for owners of earlier versions. Mac OS X 10.3.5 or later is required.

Monday, March 28, 2005

NEW! Tereshchenko Andrey has released myXML, a partial implementation of DOM, XPath, and XSLT in PHP published under the LGPL.

Saturday, March 26, 2005

The RDF Data Access Working Group has published the fourth public working draft of RDF Data Access Use Cases and Requirements. According to the introduction,

The W3C's Semantic Web Activity is based on RDF's flexibility as a means of representing data. While there are several standards covering RDF itself, there has not yet been any work done to create standards for querying or accessing RDF data. There is no formal, publicly standardized language for querying RDF information. Likewise, there is no formal, publicly standardized data access protocol for interacting with remote or local RDF storage servers.

Despite the lack of standards, developers in commercial and in open source projects have created many query languages for RDF data. But these languages lack both a common syntax and a common semantics. In fact, the extant query languages cover a significant semantic range: from declarative, SQL-like languages, to path languages, to rule or production-like systems. The existing languages also exhibit a range of extensibility features and built-in capabilities, including inferencing and distributed query.

Further, there may be as many different methods of accessing remote RDF storage servers as there are distinct RDF storage server projects. Even where the basic access protocol is standardized in some sense—HTTP, SOAP, or XML-RPC—there is little common ground upon which to develop generic client support to access a wide variety of such servers.

The following use cases characterize some of the most important and most common motivations behind the development of existing RDF query languages and access protocols. The use cases, in turn, inform decisions about requirements, that is, the critical features that a standard RDF query language and data access protocol require, as well as design objectives that aren't on the critical path.

Use cases include:

Finding an Email Address
Finding Information about Motorcycle Parts
Finding Unknown Media Objects
Monitoring News Events
Avoiding Traffic Jams
Discovering What People Say about News Stories
Exploring the Neighborhood
Sharing Vacation Photos with a Friend
Finding Input and Output Documents for Test Cases
Discovering Learning Resources
Finding Out New Things About People
Browsing Patient Records
Finding Disjunct Conditions
Finding Film Soundtracks
Managing Personal Identities
Customizing Content Delivery
Building Ontology Tools
Working with Enterprise Web Services
Building Tables of Contents

Friday, March 25, 2005

Sun has posted the second early draft review of Java Specification Request 222: Java™ API for XML Data Binding 2.0. This makes various updates to support align the spec with JAX-RPC 2.0 and support schema substitution groups. Java 1.5 will be required for JAXB 2.0. Earlier versions will not be supported. Comments are due by April 22.

Thursday, March 24, 2005

The Mozilla Project has released version 1.0.2 of Firefox, the open source web browser that is rapidly gaining on Internet Explorer. Firefox supports HTML, XHTML, CSS, and XSLT. MathML and SVG aren't supported out of the box, but can be added. 1.0.2 is a security update that is recommended for all users.

Planamesa Software has posted Beta Patch-10 (whatever that means) of NeoOffice/J 1.1, a Mac OS X variant of OpenOffice that replaces X-Windows with Java Swing. This release adds support for drag and drop between NeoOffice and native Mac apps, fixes bugs, and improves performance.

Wednesday, March 23, 2005

John Krasnay has released Vex 1.2, an open source (LGPL) XML editor that features a word processor-like interface. Vex is based on the Eclipse platform. It supports DocBook 4.1.2, 4.2, 4.3, Simplified DocBook 1.0, and XHTML 1.0 Strict and can be configured for other DTDs. 1.2.0 allows new document types to be added as plug-ins, and supports DITA.

Tuesday, March 22, 2005

Mikhail Grushinskiy has released XMLStarlet 1.0.1, a command line utility for Linux that exposes a lot of the functionality in libxml and libxslt including validation, pretty printing, and canonicalization. This release fixes some bugs and has been recompiled against libxml2 2.6.18 and libxslt 1.1.13.

The Mozilla Project has released Mozilla 1.7.6. This release fixes assorted bugs including several security issues. Thunderbird 1.0.2 has also been released. I've been using Mozilla 1.7.x for a while now, but yesterday after increasing problems with crashes, especially in Mail, I switched over to Firefox and Thunderbird. Mozilla still has a few features Firefox doesn't. Most notably a forms tool that can save and refill forms, something I find very useful for conference submissions. However, over all Firefox/Thunderbird seems to be more stable than full Mozilla, at least prior to today's update. It definitely has a prettier interface.

Monday, March 21, 2005

The W3C Compound Document Formats Working Group has published the first public Working Draft of Compound Document by Reference Use Cases and Requirements. "A compound document combines multiple formats, such as XHTML, SVG, XForms, MathML and SMIL. This draft introduces compounding by a reference like img, object, link, src and XLink. Compounding by inclusion is planned for a later phase." I think this goes a little beyond specs like XLink and XInclude, in seeking to specify not just how an XML infoset is created from compound documents, but how DOM and various behaviors extend and interoperate across different namespaces used or referenced in the same document. High-Level Requirements are listed as:

CDR MUST exploit existing specifications, favoring W3C specifications wherever possible
CDR SHOULD NOT define new markup unless absolutely required for integration purposes
CDR MUST provide the ability to define Rich Multimedia Content
CDR MUST specify a base set of formats, corresponding profiles and versions
Each CDR profile and version MUST specify, which formats can be referenced
CDR MUST specify, for each format, the element used to reference other formats, if any.
CDR MAY specify generic integration techniques
CDR MUST support temporal synchronisation of dynamic content coming from multiple references, possibly with multiple references to the same source.
CDR MUST ensure that user agent controls work consistently regardless of the component that has focus
CDR MUST support event mechanisms that cross namespace boundaries
CDR MUST support scriptability
CDR MUST say the allowed nesting level of referencing
CDR MUST explain how scripting interacts between components and the host document
CDR MUST explain which events get dispatched to the referenced document
CDR profiles MUST define how eventing (such as events, focus or links) work across namespaces
CDR profiles MUST specify how event propagation works across namespace boundaries.
CDR profiles MUST specify how focus traversal works with referenced documents.
CDR profiles MUST specify how link activation work with referenced document.
CDR profiles MUST specify triggering of animations across namespaces.
CDR MUST support fragment identifiers in cross-namespace interaction
CDR MUST support a unified rendering model
CDR profiles SHOULD provide a method for adding event handlers using declarative markup for the formats it uses
CDR documents MUST cater for accessibility requirements
CDR documents MUST support dynamic updating
CDR must define its integration into the Web Architecture. It must include delivery over HTTP and should also strive to be transport independent
CDR MUST allow compression of the data
CDR MUST allow packaging of the data
CDR MUST define the sharing of fonts among all components of a document
CDR MUST support server-side adaptation
CDR MUST support limited bandwidth networks and limited capability devices

Requirements for CDR Profile 1 (Rich Multimedia Content) are given as:

CDR Profile 1 MUST support user interaction model
CDR Profile 1 MUST explain how User Agent is able to identify a CDR Profile 1 document
CDR Profile 1 MUST support scalable graphics
CDR Profile 1 MUST support audio
CDR Profile 1 MUST support grid, flow, overlapping layouts
CDR Profile 1 MUST define transparency support for SVG backgrounds
CDR Profile 1 MAY define transparency support for XHTML backgrounds
CDR Profile 1 MUST support identification of markup and versions in CDF documents
CDR Profile 1 MUST support scalable diagrams that can be animated and can cause link traversal
CDR Profile 1 MUST define how to reference SVGT graphics and resources from an XHTML document
CDR Profile 1 MUST support advertising the specific supported versions of formats and capabilities in headers
CDR Profile 1 MUST support XHTML as a root/host language
The XHTML <object> element MUST be used for referring to other formats from XHTML
CDR Profile 1 MUST support Non-interactive Background SVG
CDR Profile 1 MUST define for animated SVG icons to act like HTML images (no need for interactivity, links, zoom and pan)
CDR Profile 1 MUST define a way for events to trigger SVG animation
CDR Profile 1 MUST define how an XHTML document can reference an SVG Tiny document
CDR Profile 1 MUST define the interaction model for an SVG document referenced by an XHTML document
CDR Profile 1 MUST define the process for real-estate negotiation between an XHTML document and a referenced SVG document
CDR Profile 1 MUST define handling of leftover SVG area
CDR Profile 1 MUST define system font support in SVG

Sunday, March 20, 2005

The W3C the Timed Text (TT) Working Group has posted the second public working draft of Timed Text (TT) Authoring Format 1.0 – Distribution Format Exchange Profile (DFXP). According to the abstract,

This document specifies the distribution format exchange profile (DFXP) of the timed text authoring format (TT AF) in terms of a vocabulary and semantics thereof.

The timed text authoring format is a content type that represents timed text media for the purpose of interchange among authoring systems. Timed text is textual information that is intrinsically or extrinsically associated with timing information.

The Distribution Format Exchange Profile is intended to be used for the purpose of transcoding or exchanging timed text information among legacy distribution content formats presently in use for subtitling and captioning functions.

In addition to being used for interchange among legacy distribution content formats, DFXP content may be used directly as a distribution format, providing, for example, a standard content format to reference from a <text> or <textstream> media object element in a [SMIL2] document.

Friday, March 18, 2005

I've posted the notes from this morning's RELAX NG session, one of two new classes I'm giving this year at Software Development 2005 West, The talk went well, though there's still a lot of skepticism (which I hope I ameliorated) about whether anything other than the W3C XML Schema Language has a chance. Tool support was a particular concern.

JAPISoft has released EditiX 3.1, a $92 payware XML editor written in Java. Features include XPath location and syntax error detection, context sensitive popups based on DTD, W3C XML Schema Language, and RelaxNG schemas, XSLT and XSL-FO previews, XInclude. XML catalogs, an XSLT debugger, DocBook support, and multi-view preview. Version 3.1 adds a file browser. EditiX is available for Mac OS X, Linux, and Windows. Upgrades from 1.x are $59.

Thursday, March 17, 2005

I've posted the notes from this morning's XForms session, one of two new classes I'm giving this year at Software Development 2005 West, There was lots of interest in this from the audience, despite the bleeding edgeness of the topic. The talk went very well, despite the late discovery that X-Smiles really doesn't work on Mac OS X 10.2. (Undergraduates really shouldn't be allowed to release code.) I think a lot of people got interested in XForms and are going to start looking at it seriously.

Tuesday, March 15, 2005

I've posted the notes from yesterday's tutorials, XML Fundamentals and Processing XML with SAX and DOM.

Saturday, March 12, 2005

I'm leaving today for Software Development 2005 West in Santa Clara. Hope to see everyone there, but updates here will be a little slower for the next week or so. Looks like a fun show. If you're in the Valley, and you haven't registered yet, expo only passes are still free and provide access to the keynotes and special panels and events. I'm not sure if BoFs are included or not, but I promise I won't look too closely at the badges (or lack thereof) at my Effective XML BoF Monday night by the pool at the Santa Clara Westin. 7:00-8:30. Hope to see you there!

Friday, March 11, 2005

The W3C has released version 9.1 of Amaya, their open source testbed web browser and authoring tool for Solaris, Linux, Windows, and Mac OS X that supports HTML 4.01, XHTML 1.0, XHTML Basic, XHTML 1.1, HTTP 1.1, MathML 2.0, SVG, XML, and much of CSS 2. Version 9.1 features a new user interface. They've also released version 8.7.2, a big fix release for the old user interface.

Thursday, March 10, 2005

The Helsinki University of Technology has released X-Smiles 0.93, a proof-of-concept XForms engine written in Java. It isn't very polished, but it does run on most platforms. I'll be using this for demos in my XForms session at Software Development 2005 West next week. This release cleans up various parts of the application, though a lot of work remains to be done.

Chiba 1.0, an open source, web-based implementation of XForms based on servlets and XSLT, has been released. Chiba enables XForms to be used in current browsers without plugins or special requirements on the client-side. Chiba is published under the artistic license.

Wednesday, March 9, 2005

BEA has published the proposed final draft of Java Specification Request 181, Web Services Metadata for the Java Platform. According to the draft,

This specification defines a simplified model for Web Services programming that is easy to learn and rapid to develop. The J2EE standard deployment technologies, APIs, and protocols require the J2EE developer to master a substantial amount of information. This JSR reduces the amount of information required to implement Web Services on J2EE by using metadata to declaratively specify the Web Services that each application provides. The metadata annotates the Java source file that implements the Web Service. While the metadata is human readable and editable using a simple text editor, graphical development tools can represent and edit the Java source file using higher levels of abstraction specific to Web Services. This is a simpler and more powerful development environment than traditional coding tools that are used to develop source code using low level APIs.

This specification relies on the JSR-175 specification - “A Program Annotation Facility for the JavaTM Programming Language” - for the Web Services metadata that annotates a Web Service implementation. This document is using JSR-175 features as described in the Public Draft Specification of JSR-175.

JSR-181 defines the syntax and semantics of Web Services metadata and default values to be used, but does not define a runtime or container environment. Instead, implementers are expected to provide tools that map the annotated Java classes onto a specific runtime environment. However while this specification does not constrain the Java environment on which Web Services are run, it assumes a J2SE 5.0 compiler as well as the functionality of the J2EE 1.4 containers. In particular, JSR-181 expects features such as JAX-RPC 1.1 and JSR-109, along with the compiler and language extensions from JSR- 175 to be present.

A JSR-181 implementation must produce a deployable Java Web Service application that can run on the target Java environment. The deployed application must exhibit the proper behavior described by the Web Services metadata and Java source code. Any two JSR-181 processors starting from the same valid annotated Java Web Services file will produce equivalent Web Service applications, even though they may deploy on very different Java environments. This ensures portability of JSR-181 compliant Java files.

IBM has published a maintenance release of Java Specification Request 110, Java APIs for WSDL. This is actually a fairly major, functional update compared to most maintenance releases.

Tuesday, March 8, 2005

Norm Walsh has published DocBook NG: The “IPA” Release; a.k.a. beta 9 of DocBook 5.0. "DocBook NG is a RELAX NG reimplementation of DocBook. It is a significant redesign that attempts to remain true to the spirit of DocBook." This release "cleans up a few content models and makes a few more elements ubiquitous." It includes both a RELAX NG schema and a DTD.

Steve Whitlatch has released the DocBook XSL Configurator, an open source "Java application used to create DocBook XSL FO customization layers. The application presents users with a tabbed pane containing several tables. Each row in each table contains several cells, one of which is editable and contains the text of the default setting for a specific DocBook XSL FO parameter. Users create projects containing paths to DocBook XML, common-customization XSL, an external XSLT processor, etc. Users then click through the tables, select DocBook XSL FO parameters they want to include in a customization layer, edit those parameters, include the customization layer in a project, write out the customization layer as an XSL file, and apply the XSL to the project's XML using the project's specified XSLT processor." Java 1.5 and the DocBook XSL stylesheets 1.67-2 are required.

John Krasnay has released Vex 1.1.1, an open source (LGPL) XML editor that features a word processor-like interface. Vex is based on the Eclipse platform. It supports DocBook 4.1.2, 4.2, 4.3, Simplified DocBook 1.0, and XHTML 1.0 Strict and can be configured for other DTDs. 1.1.1 is a bug fix release.

Sun has posted version 0.3.2 of xmlroff, an open source XSL Formatting Objects to PDF converter. xmlroff is written in C for Linux, and relies on the libxml2, libxslt, and the GLib, GObject and Pango libraries from GTK+ and GNOME (though neither GTK+ nor Gnome is required). It also needs PDFlib, FreeType2, and Fontconfig. xmlroff can be run from the command line. It also includes a libfo library. This release adds support for X11 color names in the color attribute.

Ryan Tomayko has posted Kid 0.6, "a simple Pythonic template language for XML based vocabularies. It was spawned as a result of a kinky love triangle between XSLT, TAL, and PHP." The language is based on just six attributes: kid:for, kid:if, kid:def, kid:content, kid:omit, and kid:replace; each of which contains a Python expression. Since this expression can point to externally defined functions, this is most of what you need. In addition there are attribute value templates similar to XSLT's, and <?python?> processing instructions can embed code directly in the XML document. I'm not sure I approve of the use of processing instructions in the language, but I'm not sure I don't either. Not having to escape XML-significant symbols like < and & in the embedded code is convenient. Kid templates are compiled to Python byte-code and can be imported and invoked like normal Python code. Kid templates generate SAX events and can be used with existing libraries that work along SAX pipelines. This release adds template inheritance, match templates, cElementTree support, and a refined Python API. Overall it looks like a fairly well-designed, well-thought out system that has clearly learned from the mistakes of gnarly systems like PHP, JSP, and ASP.

Monday, March 7, 2005

Sun has posted the second early draft review of Java Specification Request 224: Java™ API for XML-Based RPC (JAX-RPC) 2.0. JAX-RPC is a java API for working with SOAP and WSDL based web services. According to the draft,

Since the release of JAX-RPC 1.0, new specifications and new versions of the standards it depends on have been released. JAX-RPC 2.0 relates to these specifications and standards as follows:

JAXB

Due primarily to scheduling concerns, JAX-RPC 1.0 defined its own data binding facilities. With the release of JAXB 1.0[9] there is no reason to maintain two separate sets of XML mapping rules in the Java™ platform. JAX-RPC 2.0 will delegate data binding-related tasks to the JAXB 2.0[10] specification that is being developed in parallel with JAX-RPC 2.0. JAXB 2.0[10] will add support for Java to XML mapping, additional support for less used XML schema constructs, and provide bidirectional customization of Java , XML data binding. JAX-RPC 2.0 will allow full use of JAXB provided facilities including binding customization and optional schema validation.

SOAP 1.2

Whilst SOAP 1.1 is still widely deployed, it’s expected that services will migrate to SOAP 1.2 now that it is a W3C Recommendation. JAX-RPC 2.0 will add support for SOAP 1.2 whilst requiring continued support for SOAP 1.1. WSDL 2.0 The W3C is expected to progress WSDL 2.0[11] to Recommendation during the lifetime of this JSR. JAX-RPC 2.0 will add support for WSDL 2.0 whilst requiring continued support for WSDL 1.1.

WS-I Basic Profile 1.1

JAX-RPC 1.1 added support for WS-I Basic Profile 1.0. WS-I Basic Profile 1.1 is expected to supersede 1.0 during the lifetime of this JSR and JAX-RPC 2.0 will add support for the additional clarifications it provides.

A Metadata Facility for the Java Programming Language (JSR 175)

JAX-RPC 2.0 will define use of Java annotations[12] to simplify the most common development scenarios for both clients and servers. Web Services Metadata for the Java Platform (JSR 181) JAX-RPC 2.0 will align with and complement the annotations defined by JSR 181[13].

Implementing Enterprise Web Services (JSR 109)

The JSR 109[14] defined jaxrpc-mapping-info deployment descriptor provides deployment time Java , WSDL mapping functionality. In conjunction with JSR 181[13], JAX-RPC 2.0 will complement this mapping functionality with development time Java annotations that control Java , WSDL mapping. Web Services Security (JSR 183) JAX-RPC 2.0 will align with and complement the security APIs defined by JSR 183[15].

JAX-RPC 2.0 will improve support for document/message centric usage:

Asynchrony

JAX-RPC 2.0 will add support for client side asynchronous operations.

Non-HTTP Transports

JAX-RPC 2.0 will improve the separation between the XML based RPC frame- work and the underlying transport mechanism to simplify use of JAX-RPC with non-HTTP transports.

Message Access

JAX-RPC 2.0 will simplify client and service access to the messages underlying an exchange.

Session Management

JAX-RPC 1.1 session management capabilities are tied to HTTP. JAX-RPC 2.0 will add support for message based session management.

JAX-RPC 2.0 will also address issues that have arisen with experience of implementing and using JAX-RPC 1.0:

Inclusion in J2SE

JAX-RPC 2.0 will prepare JAX-RPC for inclusion in a future version of J2SE. Application portability is a key requirement and JAX-RPC 2.0 will define mechanisms to produce fully

portable clients.

Handlers

JAX-RPC 2.0 will simplify the development of handlers and will provide a mechanism to allow handlers to collaborate with service clients and service endpoint implementations.

Versioning and Evolution of Web Services

JAX-RPC 2.0 will describe techniques and mechanisms to ease the burden on developers when creating new versions of existing services.

Backwards Compatibility of Binary Artifacts

JAX-RPC 2.0 will not preclude preservation of backwards binary compatibility between JAX-RPC 1.x and 2.0 implementation runtimes.

Sunday, March 6, 2005

Nokia has released the final specification for JSR-226, Scalable 2D Vector Graphics API . "This API is targeted for low-end mobile devices with constraints in memory, screen size, and computational power. The goal of this specification is to define an optional API package for rendering Scalable 2D vector images, including external images in SVG format. The main target use cases of this API are map visualization, scalable icons and other applications which require scalable, animated graphics." This is a basically a stripped down DOM/SVG-DOM subset more likely to fit into small devices than the full SVG DOM, along with a few classes for telling applications to display SVG images.

Friday, March 4, 2005

The OpenOffice Project has posted the first beta of OpenOffice 2.0, an open source office suite for Linux and Windows that saves all its files as zipped XML. New features in 2.0 include a multipane view, custom shapes, enhanced database frontend, mail merge wizard, nested tables, digital signatures, XForms, and the ability to open and save WordPerfect files. OpenOffice is dual licensed under the LGPL and Sun Industry Standards Source License.

I'm pretty down on OpenOffice these days. I used the 1.0 version to write Effective XML, which was a poor choice that probably cost me months. I should have used Word. Microsoft Word is not a paragon of usability, but it's functional, which is more than I've ever been able to say for OpenOffice. In fact, OpenOffice invented whole new categories of "GUI Bloopers", preeminent among them, "Don't add a menu item for functionality you know doesn't work, just because you expect to get around to it sometime in the next couple of years." 1.1 was a slight improvement, but only slight. It fixed the most glaring and obvious bugs, but still didn't produce a product that allowed a writer to simply write without thinking about the interface, something Word does fairly well. OpenOffice might be a reasonable alternative for someone who uses their word processor infrequently enough that they're willing to put up with some pain in order to save a few bucks. However, it's clearly inadequate for those of us who write for a living.

2.0 might be an improvement, but I don't know. These days I've almost completely retired my Linux and Windows systems in favor of a Mac. There's no Mac version of 2.0, and after years of telling Mac developers not to bother porting 1.x because 2.0 was going to be so much better, the development team has realized they have no plan at all for ever running on the Mac. OpenOffice needs to learn the VRML lesson: you can't take over the world if you can't run on a Mac.

From the browser that wouldn't die department, AOL has posted the first beta of Netscape 8. This is basically just a repackaged, reskinned Firefox 1.0, security flaws and all. It does have the unique feature of allowing you to switch between the Gecko and IE rendering engines. Most users should stick to Firefox 1.0.1 or Mozilla.

Thursday, March 3, 2005

I've posted the second alpha release of XOM 1.1, my free-as-in-speech (LGPL) dual streaming/tree-based API for processing XML with Java. Version 1.1 maintains backwards compatibility with XOM 1.0 while adding a number of important new features including XPath support. Alpha 2 speeds up XPath queries by roughly a factor of 10. It also fixes assorted small bugs, and adds an XPathTypeException class that distinguishes syntactic errors from the case where an XPath expression returns something other than a node-set.

Wednesday, March 2, 2005

The W3C Internationalization GEO (Guidelines, Education & Outreach) Working Group has updated the working draft of Authoring Techniques for XHTML & HTML Internationalization: Specifying the language of content 1.0. According to the draft, "Specifying the language of content is useful for a wide number of applications, from linguistically sensitive searching to applying language-specific display properties. In some cases the potential applications for language information are still waiting for implementations to catch up, whereas in others, such as detection of language by voice browsers, it is a necessity today. Marking up language information is something that can and should be done today. Without it, it is not possible to take advantage of any of these applications. This document is one of a series of documents providing HTML authors with techniques for developing internationalized HTML using XHTML 1.0 or HTML 4.01, supported by CSS1, CSS2 and some aspects of CSS3. It focuses specifically on advice about specifying the language of content." This advice is summarized in 16 techniques:

Technique 1: Always declare language in the html tag
Technique 2: html declarations for multilingual docs
Technique 3: Declare language changes inside the document
Technique 4: Should I use the lang or xml:lang attribute?
Technique 5: Don't use Content-Language for text-processing
Technique 6: Don't use the body tag rather than the html tag
Technique 7: When attribute and content are in different languages
Technique 8: Use HTTP or the Content-Language meta tag for metadata
Technique 9: Provide a comma-separated list of languages
Technique 10: Division of multilingual docs
Technique 11: Use RFC3066
Technique 12: Use short language codes
Technique 13: Use Hans and Hant codes
Technique 14: Pros and cons of identifying the language
Technique 15: Using hreflang with CSS
Technique 16: Don't use flags to indicate languages

Tuesday, March 1, 2005

Slashdot has reviewed Effective XML. Short version: they like it. :-)

I've added several shows to the XML Conferences list, including a new one in Prague in June. I'll definitely be at Software Development 2005 in Santa Clara in a couple of weeks. Possibly I'll be at XTech and/or Extreme Markup Languages. Details are still being worked out.

The xframe project has posted beta 7 of xsddoc, an open source documentation generator for W3C XML Schemas based on XSLT. xsddoc generates JavaDoc-like documentation of schemas. Java 1.3 or later is required.

Monday, February 28, 2005

The Mozilla Project has posted the first beta of Mozilla 1.8. New features in 1.8 include FTP uploads, improved junk mail filtering, better Eudora import, and an increase in the number of cookies that Mozilla can remember. It also makes various small user interface improvements, gives users the option to disable CSS globally or on a per-page basis, and adds support for CSS quotes. Beta 1 fixes bugs and adds preliminary support for ECMAScript for XML "except for the DOM binding magic, which is coming in 1.8b2."

Saturday, February 26, 2005

The W3C XML Schema Working Group has posted the second public working drafts of XML Schema 1.1 Part 1: Structures and XML Schema 1.1 Part 2: Datatypes. According to the introduction to the structures spec,

The Working Group has two main goals for this version of W3C XML Schema:

Significant improvements in simplicity of design and clarity of exposition without loss of backward or forward compatibility;
Provision of support for versioning of XML languages defined using the XML Schema specification, including the XML transfer syntax for schemas itself.
These goals are in tension with one another. The Working Group's strategic guidelines for changes between versions 1.0 and 1.1 can be summarized as follows:
Support for versioning (acknowledging that this may be slightly disruptive to the XML transfer syntax at the margins)
Bug fixes (unless in specific cases we decide that the fix is too disruptive for a point release)
Editorial changes
Design cleanup will possibly change behavior in edge cases
Non-disruptive changes to type hierarchy (to better support current and forthcoming international standards and W3C recommendations)
Design cleanup will possibly change component structure (changes to functionality restricted to edge cases)
No significant changes in functionality
No changes to XML transfer syntax except those required by version control hooks and bug fixes
The aim with regard to compatibility is that
All schema documents conformant to version 1.0 of this specification should also conform to version 1.1, and should have the same validation behaviour across 1.0 and 1.1 implementations (except possibly in edge cases and in the details of the resulting PSVI);
The vast majority of schema documents conformant to version 1.1 of this specification should also conform to version 1.0, leaving aside any incompatibilities arising from support for versioning, and when they are conformant to version 1.0 (or are made conformant by the removal of versioning information), should have the same validation behaviour across 1.0 and 1.1 implementations (again except possibly in edge cases and in the details of the resulting PSVI);

Changes in the data type spec include:

"0000" is a legal year and values with negative years map onto the timeline such that "the year 0000 is 1 B.C.E., the year –0001 is 2 B.C.E., etc."
Distinction between identity and equality; for instance positive and negative zero would be equal but not identical. Think of the difference between == and equals() in Java.
New yearMonthDuration and dayTimeDuration types
A precisionDecimal type
An anyAtomicType data type

There are also open issues including how to align the W3C XML schema language with XML 1.1.

Friday, February 25, 2005

The W3C XML Binary Characterization Working Group (a bit of an Orwellian name, that. The one thing that's guaranteed about this effort is that the format it produces will not be XML.) has published the first public working draft of XML Binary Characterization Measurement Methodologies. "This document describes measurement aspects, methods, caveats, test data, and test scenarios for evaluating the potential benefits of an alternate serialization for XML. This document relies on the XML Binary Characterization Working Group (XBC WG) documents for Use Cases and Properties. The focus of this document is to provide a basis for later comparison rather than reporting of actual measurements of actual implementations. The examined and potential use cases represent existing uses that might benefit from the use of an XML-like format, if it had certain additional properties. This potential expansion of the XML community depends on the existence, identification, and evolution of solutions that cover the broadest problem footprint in the best fashion. The XBC WG Characterization document represents the working group's consensus of required and useful properties. This document discusses how fulfillment of those properties can be precisely evaluated and how combinations of properties are best compared."

The W3C XML Binary Characterization Working Group has also updated the working drafts of XML Binary Characterization Use Cases and XML Binary Characterization Properties. The use cases document describes 18 different scenarios where a so-called binary XML format might be desirable. I don't find most of these at all convincing. They tend to divide into scenarios that can operate just fine with traditional XML (FIXML) and scenarios that shouldn't be going anywhere near anything the even smells of XML (Floating Point Arrays in the Energy Industry, Supercomputing and Grid Processing). Have we all forgotten the Edsel?

The properties document is more theoretical, and discusses the different characteristics XML has and that a non-XML binary format might or should have. These properties include:

Processing Efficiency
Small Footprint
Space Efficiency
Accelerated Sequential Access
Compactness
Content Type Management
Deltas
Directly Readable and Writable
Efficient Update
Embedding Support
Encryptable
Explicit Typing
Extension Points
Format Version Identification
Fragmentable
Generality
Human Language Neutral
Human Readable and Editable
Integratable into XML Stack
Localized Changes
No Arbitrary Limits
Platform Neutrality
Random Access
Robustness
Roundtrip Support
Schema Extensions and Deviations
Schema Instance Change Resilience
Self Contained
Signable
Specialized codecs
Streamable
Support for Error Correction
Transport Independence

The Mozilla Project has released Firefox 1.0.1, the open source web browser that is rapidly gaining on Internet Explorer. Firefox supports HTML, XHTML, CSS, and XSLT. MathML and SVG aren't supported out of the box, but can be added. This is a bug fix release that includes several security fixes. Among them, non-ASCII characters domain names are now displayed in an encoded ASCII format rather than using Unicode. This should make phishing attempts based on domain names like www.раураІ.com more obvious. All users should upgrade.

The Software Development 2005 Expo in Santa Clara next month (March 14-18) is looking for a few more volunteers to man doors, distribute notes, and similar tasks. For each day a you volunteer you get to attend the conference for a day free, and most volunteer days involve nothing more strenuous than sitting in the back of the room listening to the presentation, and collecting eval forms at the end; so really, it's a nice way to attend the show for free.

Thursday, February 24, 2005

jCatalog Software has posted a beta of XSLfast 2.0, an €890 graphical editor for XSL Formatting objects documents that supports mail merge and form processing. New features in 2.0 include:

Simplified table handling
Multiple pages and reusable layouts can be integrated in one template
Background images for table rows and cells
Attributes are now shown in the XML structure window
The contents of XML nodes can be displayed as a list

Antenna House has released XSL Report Designer 2.0, a $1000 payware Windows program for mapping XML documents to layouts. The output is XSL Formatting Objects.

Wednesday, February 23, 2005

The W3C RDF Data Access Working Group has published the second public working draft of SPARQL Query Language for RDF. According to the introduction,

An RDF graph is a set of triples, each triple consisting of a subject, a predicate and an object, as defined in RDF Concepts and Abstract syntax. These triples can come from a variety of sources. For instance, they may come directly from an RDF document. They may be inferred from other RDF triples. They may be the RDF expression of data stored in other formats, such as XML or relational databases.

SPARQL (SPARQL Protocol And RDF Query Language) is a query language for getting information from such RDF graphs. It provides facilities to:

extract information in the form of URIs, bNodes, plain and typed literals.

extract RDF subgraphs.

construct new RDF graphs based on information in the queried graphs.

As a data access language, it is suitable for both local and remote use. When used across networks, the companion SPARQL Protocol for RDF document describes a remote access protocol.

Here's a simple example SPARQL query adapted from the draft:

PREFIX  dc: <http://purl.org/dc/elements/1.1/>
PREFIX  : <http://example.org/book/>
SELECT  $var
WHERE   ( :book1  dc:title  $var )

The $ indicates a variable name. (The previous draft used a question mark instead.) This query stores the title of a book in a variable named var. There are boolean and numeric operators as well.

Tuesday, February 22, 2005

I'm pleased to announce the first alpha release of XOM 1.1, my free-as-in-speech (LGPL) dual streaming/tree-based API for processing XML with Java. Version 1.1 maintains backwards compatibility with XOM 1.0 while adding a number of important new features including:

XPath
a setInternalDTDSubset method in DocType
Document subset canonicalization
Exclusive XML canonicalization
xml:id support
Parameters can be passed to XSL transforms

The XPath support is especially useful. You can now write declarative queries like these that find all the person elements instead of writing of complicated, fragile, explicit navigation instructions:

Nodes people = doc.query("//person");
XPathContext = new XPathContext("html", "http://www.w3.org/1999/xhtml");
Nodes toc = doc.query("//html:div[@id='toc']/child::node()", context)

The XPath support is based on the latest Jaxen source code. This is the same engine used in JDOM and dom4j. However, before bundling this with XOM I fixed a lot of bugs in Jaxen, and worked around several others. XOM is much more conformant to the XPath specification than either JDOM or dom4j. There's still at least one nasty bug in XPath evaluation I haven't been able to fix yet, but so far I've only seen it with an unusual, redundant union expression that's unlikely to arise in practice (//. | /). More importantly, there are probably other undiscovered bugs waiting to bite. If you spot any of the critters, holler and I'll try to stomp them.

Monday, February 21, 2005

Topologi has released version 2.3 of the Topologi Markup Editor, a $99 payware XML/SGML editor for Windows and Linux. It supports RELAX NG, Schematron, DTDs, and the W3C XML Schema Language. New features in this release include screenshot annotation and "Happy Tags" (whatever those are).

Topologi has also released four new $29 payware Windows utilities for inspecting XML documents:

Topologi XML Detective: Reports all the elements, attributes and namespaces: their parents, children, positions, XPaths and in which files these objects do, or do not occur.
Topologi Complexity Detective with DTD Trimmer: Reports the Document Complexity Metric for XML documents. Especially helpful for pre-sales project estimation and project management. Includes a DTD trimmer that reduces a DTD to a minimal valid structure based on sampling document instances.
Topologi Word Detective: A point and click XML indexing tool that reports on all the words found in elements or attributes in individual documents or across entire collections. The Word Detective can even report on occurrences of words that have escaped being marked up in a specified way.
Topologi XML Judge: Validate one or more files using XML Schemas, DTD, RELAX NG and Schematron. Generate usage schemas from a document set to check that new files do not contain valid but novel markup.

Sunday, February 20, 2005

The W3C Cascading Style Sheets working group has posted a new working draft of, CSS3 Backgrounds and Borders Module. Properties defined in this draft include background-color, background-image, background-repeat, background-attachment, background-position, background-clip, background-origin, background-size, background-break, background, border-color, border-style, border-width, border-image, border-radius, border-break, border-top, border-bottom, border-right, border-left, border, and box-shadow.

Saturday, February 19, 2005

The W3C Voice Browser working group has published the first public working draft of Pronunciation Lexicon Specification (PLS) Version 1.0. "This document defines the syntax for specifying pronunciation lexicons to be used by speech recognition and speech synthesis engines in voice browser applications." Defined elements include lexicon, meta, metadata, lexeme, grapheme, phoneme, alias, and example.

Friday, February 18, 2005

The W3C Web Services Addressing Working Group has updated three working drafts. Web Services Addressing - Core defines generic extensions to the Infoset for endpoint references and message addressing properties. Web Services Addressing - SOAP Binding and Web Services Addressing - WSDL Binding describe how the abstract properties defined in the core spec are implemented in SOAP and WSDL respectively.

Toni Uusitalo has posted Parsifal 0.9.1, a minimal, non-validating XML parser written in ANSI C. The API is based on SAX2. Version 0.9.1 is a bug fix release. Parsifal is in the public domain.

Thursday, February 17, 2005

The W3C Internationalization Working Group has published the final recommendation of Character Model for the World Wide Web 1.0: Fundamentals. "This Architectural Specification provides authors of specifications, software developers, and content developers with a common reference for interoperable text manipulation on the World Wide Web, building on the Universal Character Set, defined jointly by the Unicode Standard and ISO/IEC 10646. Topics addressed include use of the terms 'character', 'encoding' and 'string', a reference processing model, choice and identification of character encodings, character escaping, and string indexing."

Wednesday, February 16, 2005

Wolfgang Hoschek has released NUX 1.0, an open source add-on package for XOM that connects it to Michael Kay's Saxon 8 XSLT 2/XPath 2/XQuery processor and the Sun Multi-Schema Validator. It also provides thread-safe factories and pools for creating XOM Builder objects. NUX also includes yet another non-XML binary format. Mostly NUX addresses problems I personally don't find to be that important, so I haven't put them in the core of XOM. (Does the world really need yet another incompatible, opaque binary format for XML? For that matter, does it need even one?) However, there is one nice feature here I noticed that Hoschek hasn't emphasized: RELAX NG support for the XOM builder. Looking at the NUX API, I realize that I know how to integrate DTD and W3C XML Schema Processing into XOM, but I'm not at all sure how to integrate RELAX NG; and that's definitely worth doing. I should give some more thought to that. In the meantime, NUX can do it. NUX is published under a modified BSD license (no advertising clause).

JAPISoft has released EditiX 3.0, a $92 payware XML editor written in Java. Features include XPath location and syntax error detection, context sensitive popups based on DTD, W3C XML Schema Language, and RelaxNG schemas, XML differencing, XSLT and XSL-FO previews, and an XSLT debugger. Version 3.0 adds support XML catalogs and XInclude. EditiX is available for Mac OS X, Linux, and Windows. Upgrades are $59.

Tuesday, February 15, 2005

Version 0.42 of Inkscape, a free-as-in-speech (GPL) drawing program for Windows and Linux that uses SVG as its native format, has been released. According to the web page, "The primary focus of 0.41 has been bug fixing. With over 100 bugs fixed since the 0.40 release, this significantly strengthened Inkscape on Windows and for international users. A number of large scale changes are planned for the 0.42 release, including converting the interface to use Gtkmm and the incorporation of a new Document Object Model (DOM), which will provide the core to build scripting onto. If you'd like to join in the work, please drop by. "

Kiyut has released Sketsa 3.0, a $49 payware SVG editor written in Java. Version 3.0 supports plug-ins, adds a canvas background, and improves the user interface. Java 1.4.2 or later is required.

Monday, February 14, 2005

Michael Smith has posted version 1.68.1 of the DocBook XSL stylesheets. These support transforms to HTML, XHTML, and XSL-FO. Besides bug fixes, major enhancements in this release include localization support for Farsi and improved support for the XLink-based DocBook NG db:link element.

XMLmind has released version 2.9 of their XML Editor. This $220 payware product features word processor and spreadsheet like views of XML documents. This release adds support for RELAX NG, a schema language that continues its march to inevitable world domination. A free-beer hobbled version is also available.

Saturday, February 12, 2005

The W3C XSL and XML Query working groups have published nine revised working drafts:

Changes in XPath 2.0 in these drafts include:

The fn:id and fn:idref functions now work on values specified as xs:ID, xs:IDREF and xs:IDREFS as well as the DTD types ID, IDREF and IDREFS and the newly-defined xml:id.
A fn:codepoint-equal has been added that compares strings based on the Unicode code point collation.
A fn:doc-available function has been added to indicate whether an XML document can be retrieved from a given URI.
xdt:untypedAny has been changed to xdt:untyped.
xs:anyType is no longer an abstract type, but is now used to denote the type of a partially validated element node. Since there is no longer a meaningful distinction between abstract types and concrete types, these terms are no longer used in this document.
Value comparisons now return the empty sequence if either operand is the empty sequence.
The typed value of a namespace node is an instance of xs:string, not xdt:untypedAtomic.
The precedence of the cast and treat operators and unary arithmetic operators has been increased.
A new component has been added to the static context: context item static type.
The XPath 2.0 specification now clearly distinguishes between "statically-known namespaces" (a static property of an expression) and "in-scope namespaces" (a dynamic property of an element).

XSLT 2 specific changes in these drafts include:

A non-schema-aware processor now allows all the built-in types defined in XML Schema to be used; previously only a subset of the primitive types plus xs:integer were permitted.
Error codes have been assigned to some error conditions that previously had no code assigned.
xsl:use-when attributes can appear on elements that are not in the XSLT namespace, whether or not it is a literal result element. For example, it can usefully appear on an extension instruction.
The behavior of certain constructs in backwards-compatible mode has changed to more closely reflect the XSLT 1.0 behavior. Specifically:
- In backwards compatible mode, the xsl:number instruction now outputs NaN when the supplied value is an empty sequence or non-numeric, rather than signaling an error.
- In backwards compatible mode, parameters passed to a built-in template rule are not passed on.
- If no output method is explicitly requested, and the first element node output appears to be an XHTML document element, then under XSLT 2.0 the output method defaults to XHTML; with backwards compatibility enabled, the XML output method will be used.
- An XSLT 1.0 processor compared the value of the expression in the use attribute of xsl:key to the value supplied in the second argument of the key function by converting both to strings. An XSLT 2.0 processor normally compares the values as supplied. The XSLT 1.0 behavior is emulated if any of the xsl:key elements making up the key definition enables backwards-compatible behavior.
XPath expressions in attribute value templates are now expanded using the same rules as apply to the select attribute in instructions such as xsl:attribute. The effect of the change is that if the value of the expression contains several adjacent text nodes, no whitespace is inserted between the string values of these text nodes.
When the 3-argument form of thekey function is used, the search is now restricted to the subtree rooted at the node identified by the third argument. Previously the third argument merely identified the document to be searched.
The rules for the format-number function have been changed so that numbers are never output with a trailing decimal point. (I'm not sure I like this change. It could cause problems when generating C code, for example.)
The undeclare-namespaces attribute has been renamed undeclare-prefixes.
It is now a recoverable error to generate nodes in the result tree using a namespace name that is not a valid instance of xs:anyURI. XSLT 1.0 explicitly stated that this was not an error; however, the XPath 2.0 data model assumes that the name of a node is a valid xs:QName, and the namespace part of a valid xs:QName, if present, must be a valid xs:anyURI. The fact that this error is recoverable, however, gives implementations freedom to avoid strict validation of namespace names if they wish to do so.

XQuery-specific changes in these drafts include:

An ordering declaration has been added to the Prolog, which affects the ordering semantics of path expressions, FLWOR expressions, and union, intersect, and except expressions. In addition, ordered and unordered operators have been introduced that permit ordering semantics to be controlled at the expression level within a query.
Validation has been separated from construction. Validation now occurs only as a result of an explicit validate expression. Validation modes are strict and lax, and are specified on the validate expression. New construction modes strip and preserve have been defined and are declared in the Prolog. The notion of "validation context" has been deleted. The XQuery definition of validation has been converged with the definition used in XSLT.
Function overloading: That is, multiple user-defined functions can have the same name as long as they have different numbers of arguments.
xdt:untypedAny is changed to xdt:untyped.
Computed namespace constructors are now completely static and are allowed only inside a computed element constructor. Namespace declarations in a computed element constructor must come before the element content, and must consist entirely of literals. The namespace prefix is optional. If absent, it has the effect of setting the default namespace for elements and types within the scope of the constructed element.
The syntax for variable initialization in the Prolog now uses an assignment operator (":="). Also, circularities in variable initialization are now static errors.
An error is raised if a module attempts to import itself (target namespace of importing module and imported modules are the same).
A schema can now be imported without specifying either a target namespace or a location hint.
Module imports and schema imports now accept multiple location hints, representing multiple physical resources in the same module or schema.
CData Sections are no longer considered to be constructors, but are simply a notational convenience for embedding special characters in the content of an element or attribute constructor.
Three new components have been added to the static context: XQuery Flagger status, XQuery Static Flagger status, and context item static type. (Note: Flagger status items were later deleted.)
An order by clause may now accept values of mixed type if they have a common type that is reachable by numeric type promotion and/or moving up the type derivation hierarchy, and if this common type has a gt operator.
In element and document node constructors, if the content sequence contains a document node, that node is replaced by its children (this was previously treated as an error).
Atomization now applies to the name expression of a computed processing instruction constructor.
It is now implementation-defined whether undeclaration of namespace prefixes in an element constructor is supported. If supported, this feature conforms to the semantics of Namespaces 1.1. In other words, if an element constructor binds a namespace prefix to the zero-length string, any binding of that prefix defined at an outer level is suspended within the scope of the constructed element.
In a computed text node constructor, the expression enclosed in curly braces is no longer optional, since it is not possible to construct an empty text node.
Rules for processing comment constructors have changed, to ensure that the resulting comment does not contain adjacent hyphens or end with a hyphen.

For the first time in years, most of the various working drafts are now in sync. (The formal semantics document still hasn't completely caught up.) Changes over all aren't major. These still aren't last call working drafts, though; and it seems unlikely the working group will finish this year. Just maybe the final versions will be released in 2006.

The W3C XQuery working group has also published the first public working draft ofXQuery Update Facility Requirements. XQuery as it currently exists is basically just SELECT in SQL terms. This is the beginning of work on INSERT, UPDATE, and DELETE. This is just a list of proposed requirements for an eventual update language. No actual syntax or behavior is suggested in this draft.

In related news. Michael Kay has released version 8.3 of Saxon, his XSLT 2.0 and XQuery processor. Besides updating Saxon to cover the latest working drafts, this release makes the dependency on JAXP 1.3 a lot softer so that Saxon is much easier to install and run in Java 1.4 environments. Saxon 8.3 is published in two versions for both of which Java 1.4 or later is required. Saxon 8.3B is an open source product published under the Mozilla Public License 1.0 that "implements the 'basic' conformance level for XSLT 2.0 and XQuery." Saxon 8.3SA is a £250.00 payware version that "allows stylesheets and queries to import an XML Schema, to validate input and output trees against a schema, and to select elements and attributes based on their schema-defined type. Saxon-SA also incorporates a free-standard XML Schema validator. In addition Saxon-SA incorporates some advanced extensions not available in the Saxon-B product. These include a try/catch capability for catching dynamic errors, improved error diagnostics, support for higher-order functions, and additional facilities in XQuery including support for grouping, advanced regular expression analysis, and formatting of dates and numbers." Upgrades from 8.x are free.

Friday, February 11, 2005

In an inspired bit of naming, Andrea Bittau has posted PsychoPath 0.1, an open source (LGPL), schema aware XPath 2.0 processor written in Java. According to Bittau, "A large amount of the specification has been implemented, tested and is probably usable. However, efficiency and performance has not been addressed yet."

Thursday, February 10, 2005

Michael Smith has posted version 1.68.0 of the DocBook XSL stylesheets. These support transforms to HTML, XHTML, and XSL-FO. Besides bug fixes, major enhancements in this release include:

Side floats, margin notes, and custom floats.
New parameters body.start.indent and body.end.indent
xml:id
refdescriptor.
Multiple refnamedivs.
Customization of index entries.
@floatstyle in figure
table-layout="auto" for XEP
sidebar-width and float-type processing instructions in sidebar
new hyphenate.verbatim.characters parameter to specify characters after which a line break can occur in verbatim environments.
itemizedlist.label.markup to enable selection of different bullet
New SVG admonition graphics and navigation images.

Wednesday, February 9, 2005

The W3C XML Core Working Group has posted the candidate recommendation of xml:id Version 1.0. This describes an idea that's been kicked around in the community for some time. The basic problem is how to link to elements by IDs when a document doesn't have a DTD or schema. The proposed solution is to predefine an xml:id attribute that would always be recognized as an ID, regardless of the presence or absence of a DTD or schema. Unfortunately, it's recently been discovered that this is pretty badly incompatible with canonical XML, which likes to inherit attributes in the XML namespace onto descendant elements, thus moving xml:id's from one element to another. This has downstream effects on XML digital signatures and XML encryption. The working group is still trying to figure out what to do about this, and fixing may necessitate going back to last call.

The problem is really in canonical XML, which assumed that all attributes in the XML namespace would act like xml:lang and xml:space. Unfortunately the two defined after canonical XML don't behave that way. (xml:base also turns out to be subtly incompatible with canonicalization.) If we could go back in time, then fixing canonical XML would be the obvious choice. Hindsight's 20/20. However since we can't do that, and since canonical XML is already widely deployed, the proposal I favor suggests using an xmlid attribute in no namespace instead. The working group doesn't seem to like that. I'm afraid that solution is just a little too simple for the W3C since it doesn't activate the whole namespace machinery they're so fond of.

Tuesday, February 8, 2005

The Big Faceless Organization has released the Big Faceless Report Generator 1.1.25, a $1200 payware Java application for converting XML documents to PDF. Unlike most similar tools it appears to be based on HTML and CSS rather than XSL Formatting Objects. This is mostly a bug fix release. Java 1.2 or later is required.

Sunday, February 6, 2005

Syntext has posted the second release candidate of Serna 2.1. a $299 payware XSL-based WYSIWYG XML Document Editor for Mac OS X, Windows, and Unix. Features include on-the-fly XSL-driven XML rendering and transformation, on-the-fly XML Schema validation, and spell checking. Version 2.0 adds a customizable GUI, "liquid" dialog boxes, multiple validation modes (strict, on, and off), and large document support.

Saturday, February 5, 2005

Oleg Tkachenko has released nxslt 1.5, a Windows command line utility for accessing the .Net XSLT engine. This release fixes bugs, supports the final W3C XInclude Recommendation, and can accept user credentials to be used when loading XML documents and XSLT stylesheets. nxslt is written in C# and requires the .NET Framework version 1.0 to be installed.

Friday, February 4, 2005

The W3C Synchronized Multimedia Working Group has posted the first public working draft of the Synchronized Multimedia Integration Language (SMIL 2.1). SMIL 2.1 has four goals:

Define an XML-based language that allows authors to write interactive multimedia presentations. Using SMIL, an author can describe the temporal behaviour of a multimedia presentation, associate hyperlinks with media objects and describe the layout of the presentation on a screen.
Allow reusing of SMIL syntax and semantics in other XML-based languages, in particular those who need to represent timing and synchronization. For example, SMIL components are used for integrating timing into XHTML and into SVG.
Extend the functionalities contained in the SMIL 2.0 into new or revised SMIL 2.1 modules.
Define new SMIL 2.1 Mobile Profiles incorporating features useful within the mobile industry.

Thursday, February 3, 2005

Mozilla has posted a beta of an X-Forms extension for Mozilla 1.8/Firefox 1.1. I'll be talking about this stuff at Software Development 2005 West in Santa Clara next month. Guess it's time to upgrade my browser again. I wouldn't mind have a chance to rehearse this presentation before then. If any new York area user groups, are looking for last minute speakers in the next month, holler.

Wednesday, February 2, 2005

Toni Uusitalo has posted Parsifal 0.9.0, a minimal, non-validating XML parser written in ANSI C. The API is based on SAX2. Version 0.90 adds validation and external DTD subset support. Parsifal is in the public domain.

Tuesday, February 1, 2005

John Cowan has posted the second release candidate of TagSoup, an open source, Java-language, SAX parser for nasty, ugly HTML. I use TagSoup to convert JavaDoc to well-formed XHTML. Cowan writes, "This is not really a 'release candidate', but rather a set of bug fixes to TagSoup 1.0rc1. There are still some known problems in it, but they seem to appear only in very pathological input, like randomly generated HTML. I decided to release this version now in order to get the bug fixes to people who've been asking for them. Please upgrade and report on any problems you find." TagSoup is dual licensed under the Academic Free License and the GPL.

Sun has posted version 0.3.1 of xmlroff, an open source XSL Formatting Objects to PDF converter. xmlroff is written in C for Linux, and relies on the libxml2, libxslt, and the GLib, GObject and Pango libraries from GTK+ and GNOME (though neither GTK+ nor Gnome is required). It also needs PDFlib, FreeType2, and Fontconfig. xmlroff can be run from the command line. It also includes a libfo library. "0.3.1 is a maintenance release that adds improved documentation and minor bug fixes."

Monday, January 31, 2005

Norm Walsh has released version 4.4 of DocBook, an XML application designed for technical documentation and books such as Processing XML with Java. This is a "maintainance release. It introduces no backwards-incompatible changes. All valid DocBook 4.3 documents are also valid DocBook 4.4 documents. The genesis of this release is a bug in the catalog files for DocBook V4.3. The Committee decided to produce a 4.4 release, incorporating a few recent backwards-compatible changes, rather than simply produce a 4.3.1 release to fix the bugs." New elements in DocBook 4.4 include package, bibliolist, and biblioref. Version 4.4 is available in XML and SGML DTDs. Unofficial RELAX NG and W3C XML schemas are also available

Saturday, January 29, 2005

Mikhail Grushinskiy has released XMLStarlet 1.0, a command line utility for Linux that exposes a lot of the functionality in libxml and libxslt including validation, pretty printing, and canonicalization. This release fixes some bugs and has been recompiled against libxml2 2.6.17 and libxslt 1.1.12.

Friday, January 28, 2005

The W3C XML Protocol Working Group has published three final recommendations covering XOP, a MIME multipart envelope format for bundling XML documents with binary data:

XML-binary Optimized Packaging "defines the XML-binary Optimized Packaging (XOP) convention, a means of more efficiently serializing XML Infosets (see [XMLInfoSet]) that have certain types of content. A XOP package is created by placing a serialization of the XML Infoset inside of an extensible packaging format (such a MIME Multipart/Related, see [RFC 2387]). Then, selected portions of its content that are base64-encoded binary data are extracted and re-encoded (i.e., the data is decoded from base64) and placed into the package. The locations of those selected portions are marked in the XML with a special element that links to the packaged data using URIs. "
SOAP Message Transmission Optimization Mechanism "describes an abstract feature and a concrete implementation of it for optimizing the transmission and/or wire format of SOAP messages. The concrete implementation relies on the [XML-binary Optimized Packaging] format for carrying SOAP messages."
Resource Representation SOAP Header Block "describes the semantics and serialization of a SOAP header block for carrying resource representations in SOAP messages."

Basically this is another whack at the packaging problem: how to wrap up several documents including both XML and non-XML documents and transmit them in a single SOAP request or response. In brief, this proposes uses a MIME envelope to do that. This is all reasonable. I do question the wisdom, however, of pretending this is just another XML document. It's not. The goal is to ship binary data like images in their native binary form, which is sensible. What I don't like is claiming that this non-XML, MIME based format is XML because one could theoretically translate the binary data into Base-64, reshuffle the parts, and come up with something that is an XML document, even though no one will actually do that.

Why is there this irresistible urge throughout the technology community to call everything XML, even when it clearly isn't and clearly shouldn't be? XML is very good for what it does, but it doesn't and shouldn't try to be all things to all people. Fundamentally binary data such as scanned images and digitized movies is not something XML does well, and not something it ever will do well. Render into binary what is binary, and render into XML what is text.

Thursday, January 27, 2005

Cool discovery of the day: I just noticed that at the very bottom of the iTunes Music Store there's now an option to choose your country, so even though I'm in the U.S. I can now shop for music in a dozen plus countries and languages. For someone like me who prefers European pop to today's U.S. music, this is a very big deal. Right now I'm watching a German video from some band called "Rosenstolz" I've never heard of before. (Another tip: you have to pay for the songs, but the videos on the iTunes Music Store are free!) Some songs and videos are shared between stores, but some appear to be unique to particular stores, even in English. The French store has Marilyn Manson's Personal Jesus video, which apparently didn't make the cut at the U.S. store. (Great video, by the way. Also worth watching: Bowling for Soup - 1985, Lee Ann Womack - I May Hate Myself in the Morning, Avril Lavigne - Nobody's Home, and Eric Prydz - Call on Me. If the iTunes Music Store used a real web browser instead of custom client, I might be able to provide links to these instead of just telling you the titles.)

Hmm, it looks like I can browse the stores, listen to previews, and watch the videos, but I can't actually buy music. I guess Apple hasn't quite worked out whatever licensing issues were preventing them from offering music across borders. I remaining amazed at the incredible stupidity of the entertainment industry and their willingness to deliberately turn customers paying away. Oh well. Back to Gnutella.

Ryan Tomayko has posted Kid 0.5, "a simple Pythonic template language for XML based vocabularies. It was spawned as a result of a kinky love triangle between XSLT, TAL, and PHP." The language is based on just six attributes: kid:for, kid:if, kid:def, kid:content, kid:omit, and kid:replace; each of which contains a Python expression. Since this expression can point to externally defined functions, this is most of what you need. In addition there are attribute value templates similar to XSLT's, and <?python?> processing instructions can embed code directly in the XML document. I'm not sure I approve of the use of processing instructions in the language, but I'm not sure I don't either. Not having to escape XML-significant symbols like < and & in the embedded code is convenient. Kid templates are compiled to Python byte-code and can be imported and invoked like normal Python code. Kid templates generate SAX events and can be used with existing libraries that work along SAX pipelines. This release changes the license from GPL to MIT. Overall it looks like a fairly well-designed, well-thought out system that has clearly learned from the mistakes of gnarly systems like PHP, JSP, and ASP.

Wednesday, January 26, 2005

The IETF has released RFC 3986, Uniform Resource Identifier (URI): Generic Syntax, as an official standard (only the 66th they've published). This replaces RFC 2396 as the official definition of URIs, and it's about time. RFC 2396 looked simple on the surface, when you started digging into it, it became obvious that it had lots of unaddressed cases and unanswered questions. 3986 is a vast improvement. For instance, it finally requires that non-ASCII characters be encoded in UTF-8 prior to percent escaping, rather than whatever character set the author happens to prefer. Given the increasing importance of URIs to everything from billboards to the semantic web, it's a wonder we've come as far as we have on such shaky foundations.

In addition, the IETF has issued RFC 3987, Internationalized Resource Identifiers (IRIs), as a proposed standard. IRIs are basically the same as URIs except that you don't have to escape non-ASCII characters, so you can write http://www.cafeconleche.org/reports/cω.html instead of http://www.cafeconleche.org/reports/c%CF%89.html. A lot of XML-related specs such as XInclude either implicitly or explicitly use IRIs instead of URIs.

Tuesday, January 25, 2005

IBM's developerWorks has published my latest article, An Early Look at sXBL. sXBL is a descendant of Mozilla's XBL. The goal for it is fairly limited: really just a macro language for Scalable vector Graphics (SVG). However, I think it has more potential than perhaps its inventors realize, as I try to make clear in the article. In particular, I think it could become a very important stylesheet language that can take on tasks XSL and CSS just can't handle.

Monday, January 24, 2005

This Thursday, January 27th, I will be talking about Effective XML at the monthly meeting of the New York Linux User's Group in Manhattan. The meeting starts at 6:30, and takes place in midtown Manhattan. Admission is free, but preregistration is required to pass through the security in the IBM building. You must preregister by 2:30 P.M. EST Wednesday! There'll be a plentiful selection of giveaways, and an after-meeting adjournment to a local tavern. Hope to see you there!

Sunday, January 23, 2005

Sonic Software has released Stylus Studio 6 XML Professional Edition, Release 2 , a $495 payware XML editor for Windows. Features include:

XML differencing
XSLT debugging
XSLT mapping
XSLT profiling
XSL:FO
XQuery editing, mapping, and debugging.
XML Schema Editor
Document Type Definition (DTD) Editor
XPath Evaluator
XPath Expression Generator
Web Service Call Composer
UDDI Registry Browser
Tools for mapping to and from XML documents, Web service data, relational data, and flat files
Import/export utilities for RDBMS, XML, CSV, ADO, and flat files
JSP Editor
XSLT 2.0 Editor and Debugger
Supports the July 2004 XQuery 1.0 working drafts
Convert flat files, binary data, EDI, and other formats to XML
XML Schema Editor
XML grid view for editing tabular XML data

New features in this release include support for Electronic Data Interchange (EDI)-to-XML mapping, support for Saxon 8.1.1, integration with Mark Logic Content Interaction Server 2.2 and Sleepycat DB XML 2.0, updated XML editing and validation support for the XSV 2.8 XML processor, simultaneous text-diagram view/editing in the XML Schema Editor, in-place editing of any XML Schema component references, and support for substitution groups.

Saturday, January 22, 2005

The W3C Device Independence Working Group (DIG) has published a note on Delivery Context Overview for Device Independence. "This document provides an overview of the role of delivery context in achieving a device independent Web. It describes the kind of information that may be included in the delivery context, and how it may be used. It surveys current techniques for conveying delivery context information, and identifies further developments that would enhance the ability to adapt content for different access mechanisms."

The DIG group has also published the second working draft of Glossary of Terms for Device Independence. This defines a number of terms such as authored unit, browser, client, HTTP server, harmonized user experience, perceived unit, and so forth. New terms defined in this draft include aggregation, aggregated authored, physical transducer, single authoring, multiple authoring, and flexible authoring. The definitions of delivery context and decomposition have been modified.

Friday, January 21, 2005

Ryan Tomayko has posted Kid 0.4, "a simple Pythonic template language for XML based vocabularies. It was spawned as a result of a kinky love triangle between XSLT, TAL, and PHP." The language is based on just six attributes: kid:for, kid:if, kid:def, kid:content, kid:omit, and kid:replace; each of which contains a Python expression. Since this expression can point to externally defined functions, this is most of what you need. In addition there are attribute value templates similar to XSLT's, and <?kid?> processing instructions can embed code directly in the XML document. I'm not sure I approve of the use of processing instructions in the language, but I'm not sure I don't either. Not having to escape XML-significant symbols like < and & in the embedded code is convenient. Kid templates are compiled to Python byte-code and can be imported and invoked like normal Python code. Kid templates generate SAX events and can be used with existing libraries that work along SAX pipelines. This release changes the license from GPL to MIT. Overall it looks like a fairly well-designed, well-thought out system that has clearly learned from the mistakes of gnarly systems like PHP, JSP, and ASP. Why am I not surprised to see this coming out of the Python community?

Thursday, January 20, 2005

As if RSS 0.9, 0.91, 0.92, 1.0, and 2.0 weren't enough to deal with, now there's RSS 1.1. I think the authors are missing the forest for the trees here. While there are some small improvements in RSS 1.1 relative to RSS 1.0 (which is a completely different beast than RSS 0.9x and RSS 2.0), they are simply not outweighed by the cost of expanding market confusion and incompatibility. Oh well, maybe if we're lucky, this will be the straw that breaks the camel's back, and convinces the world to just move forward to ATOM leaving RSS in the dustbin of history where it belongs. (Hmm, three tired cliches in one paragraph. Is that a personal record? Guess I'm just not feeling very creative this morning.)

Google has proposed using a rel="nofollow" attribute on a elements in comments and trackbacks to prevent comment spam. At first I thought this was a very neat idea (after checking the HTML spec to see that this is indeed a valid HTML attribute, of course). On further reflection, I'm not so sure. It makes sense for trackbacks, which really aren't that relevant. However, legitimate, non-spam links in comments should not be penalized. I'm afraid blog and comment systems will simply start putting rel="nofollow" on all links, which is a tad too aggressive for my tastes. Perhaps we could allow this to be configured on a per-user basis by the site maintainer? Longer term, I'd really like the search engines to be smart enough to figure out what's link spam and what isn't. If Bayesian analysis works for e-mail spam it should work for comment spam too, especially since all the search engine really has to do is notice the relatively small percent of sites that attract a disproportionate number of links from comments and trackbacks.

SyncroSoft has released version 5.1 of the <Oxygen/> XML editor. Oxygen supports XML, XSL, DTDs, XQuery, SVG, and the W3C XML Schema Language. New features in version 5.1 include code folding, code templates, improved Relax NG support, Schematron validation, and the ability to use MSXML and XSLTProc as XSLT transformers. It costs $128 with support. Upgrades from pre 5.0 versions are $76. Upgrades from 5.0 are free.

Wednesday, January 19, 2005

The Mozilla Project has posted the sixth alpha of Mozilla 1.8. New features in 1.8 include FTP uploads, improved junk mail filtering, better Eudora import, and an increase in the number of cookies that Mozilla can remember. It also makes various small user interface improvements, gives users the option to disable CSS globally or on a per-page basis, and adds support for CSS quotes. Alpha 6 fixes a slew of bugs and adds some small features on Windows and Linux. Most notably, it upgrades the built-in XML parser from Expat 1.2 to Expat 1.95.7. "Because of that upgrade, Mozilla complies better with both the XML specification and the Namespaces in XML specification. Starting with 1.8a6, Mozilla will reject some invalid XML documents that it used to accept in prior versions. This has already led to some problems with extensions that used invalid XML documents, but the only solution is to correct those documents. Here are some of the well-formedness errors that Mozilla catches since the upgrade: - Invalid entity names (for example entity names containing a colon) - Undeclared prefixes - Multiple attributes with same localname and different prefixes bound to the same namespace names - Redeclaration of reserved prefixes and namespace names."

Tuesday, January 18, 2005

Dave Beckett has released the Raptor RDF Parser Toolkit 1.4.4, an open source C library for parsing the RDF/XML, N-Triples. Turtle, and Atom Resource Description Framework formats. It uses expat or libxml2 as the underlying XML parser. Version 1.44 adds support for RSS 0.9 and fixes bugs. Raptor is dual licensed under the LGPL and Apache 2.0 licenses.

Monday, January 17, 2005

The Gnome Project has released version 2.6.17 of libxml2, the open source XML C library for Gnome. This release fixes various bugs. It also makes some improvements to W3C XML schema and Python support, and adds a few bits and pieces to the API.

Ian E. Gorman has released GXParse 1.6, a free (LGPL) Java library that sits on top of a SAX parser and provides semi-random access to the XML document. The documentation isn't very clear, but as near as I can tell, it buffers various constructs like elements until their end is seen, rather than dumping pieces on you immediately like SAX does. This release adds documentation and example code. Internally, the code has been refactored somewhat.

Sunday, January 16, 2005

Eric S. Raymond has released doclifter 2.1, an open source tool that transcodes {n,t,g}roff documentation to DocBook. It also translates man, mandoc, ms, me, and TkMan source documents. Raymond claims the "result is usable without further hand-hacking about 95% of the time." This release fixes bugs. Doclifter is written in Python, and requires Python 2.2a1. doclifter is published under the GPL.

Saturday, January 15, 2005

Thanks to repeated reader requests and a rainy Friday afternoon with not a lot better to do, I've added an RSS feed for the recommended reading. To answer one frequently asked question, no, I am not planning to add the quote of the day to the RSS feed. Too many of the quotes require formatting beyond what RSS can handle. Possibly at some point in the future I'll consider whipping up an Atom feed for these.

Friday, January 14, 2005

The W3C RDF Data Access Working Group has published the first public working draft of SPARQL Protocol for RDF. "The RDF Query Language SPARQL expresses queries over RDF graphs. This document defines a protocol for communicating those queries to an RDF data service."

Thursday, January 13, 2005

The W3C Voice Browser Working Group has published the second last call working draft of Voice Browser Call Control: CCXML Version 1.0. According to the spec abstract, "CCXML is designed to provide telephony call control support for dialog systems, such as VoiceXML [VOICEXML]. While CCXML can be used with any dialog systems capable of handling media, CCXML has been designed to complement and integrate with a VoiceXML interpreter. Because of this there are many references to VoiceXML's capabilities and limitations. There are also details on how VoiceXML and CCXML can be integrated. However, it should be noted that the two languages are separate and are not required in an implementation of either language. For example, CCXML could be integrated with a more traditional Interactive Voice Response (IVR) system or a 3GPP Media Resource Function (MRF), and VoiceXML or other dialog systems could be integrated with some other call control systems." Changes in this draft include:

Better documentation of the ccxml.kill events
Clean up of the fetch/goto/createccxml framework
Consistent formatting of element attribute tables
Cleanup of media duplex model
Addition of a writeable application scope.
Changes the script element to allow statically compiled scripts
Better definition of the event handling algorithm
Removed regular expressions from event names on transition
Better definition of move
Updated conference object
Added merge element

Wednesday, January 12, 2005

The W3C CSS working group has updated the working draft of CSS3 Speech Module. This spec defines CSS properties used when documents are read out loud. These include voice-volume, voice-balance, speak, pause-before, pause-after, pause, cue-before, cue-after, cue, mark-before, mark-after, mark, voice-rate, voice-family, voice-pitch, voice-pitch-range, voice-stress, voice-duration, phonemes, @phonetic-alphabet. This draft aligns "the definitions with the latest version of SSML as it reached W3C Recommendation status. This effects voice-volume, voice-rate, voice-pitch, voice-pitch-range, and voice-stress, where the enumerated logical values are now defined as monotonically non-decreasing sequences to match SSML. Named relative values such as louder and softer have been dropped since they are not supported by SSML and can't be related through percentage changes to the enumeration of logical values."

Tuesday, January 11, 2005

The GNU Project has released GNU JAXP 1.3, a free-as-in-speech (GPL with library exception) implementation of the Java API for XML Processing. This is the last release as a standalone package. In the future it will be maintained as part of GNU Classpath. The web page is admirably written. I wish the web pages for most payware software did as good a job of explaining concisely and clearly what the product is, what it does, how to get it, and everything else a potential user needs to know. I don't think I can do better than just quote it:

GNU JAXP provides the Ælfred2 SAX2 parser, and is configured to use its optionally validating module by default. It includes an implementation of the DOM Level 3 interfaces provided by the World Wide Web Consortium. There is also a (fast!) XPath 1.0 and XSLT 1.0 implementation which supports all options for input (source) and output (result). These are accessible through the JAXP 1.3 and W3C DOM bootstrapping APIs.

The following W3C DOM APIs are supported:

DOM Level 3 Core

DOM Level 3 Load & Save

DOM Level 3 XPath

DOM Level 2 Events

DOM Level 2 Traversal

GNU JAXP additionally provides a mostly-complete alternative implementation of DOM Level 3 Core and XPath, a SAX2 parser, and a JAXP XSLT transformer that use the Gnome libxml2 and libxslt libraries. These libraries, implemented in C, provide very fast parsing and transformation. The libxmlj library, a JNI wrapper for the Gnome libraries, allows you to leverage this speed via the JAXP interfaces. However, libxmlj is still experimental and does not conform as well to the published SAX and DOM conformance tests, and JNI deployment may be a barrier to entry.

At the current time, libxmlj does not support the JAXP XPath interface, only the W3C DOM one.

GNU JAXP is not a complete implementation of JAXP 1.3. It does not currently provide support for tree validation: the only validation available is XML (DTD) validation by the SAX parser. We are still working on W3C XML Schema and RELAX NG validators and the integration of the PSVI into the DOM.

Java 1.4 or later is required.

Monday, January 10, 2005

The W3C Web Services Choreography Working Group has posted the last call working draft of Web Services Choreography Description Language Version 1.0. According to the abstract,

The Web Services Choreography Description Language (WS-CDL) is an XML-based language that describes peer-to-peer collaborations of parties by defining, from a global viewpoint, their common and complementary observable behavior; where ordered message exchanges result in accomplishing a common business goal.

The Web Services specifications offer a communication bridge between the heterogeneous computational environments used to develop and host applications. The future of E-Business applications requires the ability to perform long-lived, peer-to-peer collaborations between the participating services, within or across the trusted domains of an organization.

The Web Services Choreography specification is targeted for composing interoperable, peer-to-peer collaborations between any type of party regardless of the supporting platform or programming model used by the implementation of the hosting environment.

Comments are due by January 31.

Sunday, January 9, 2005

The W3C Web Services Description Working Group has published the first public working draft of Web Services Description Language (WSDL) Version 2.0 Part 0: Primer. "It is intended for readers who wish to have an easier, less technical introduction to the main features of the language." The primer is based on an example of a hotel reservation service.

Saturday, January 8, 2005

The W3C Synchronized Multimedia working group has released the second edition of Synchronized Multimedia Integration Language (SMIL 2.0). According to the spec, "This second edition of SMIL 2.0 is not a new version, it merely incorporates the changes dictated by the corrections to errors found in the first edition as agreed by the SYMM Working Group, as a convenience to readers."

Thursday, January 6, 2005

I'm very pleased to announce the release of XOM™ 1.0, a new XML object model. XOM is a free-as-in-speech (LGPL), library for processing XML with Java. XOM supports a number of XML technologies including Namespaces in XML, XSLT, XInclude, and Canonical XML. XOM documents can be converted to and from SAX and DOM. XOM strives for correctness, simplicity, and performance, in that order. XOM is very easy to learn and easy to use. It works very straight-forwardly, and has a very shallow learning curve. Assuming you're already familiar with XML, you should be able to get up and running with XOM very quickly.

XOM is the only XML API that makes no compromises on correctness. XOM only accepts namespace well-formed XML documents, and only allows you to create namespace well-formed XML documents. (In fact, it's a little stricter than that: it actually guarantees that all documents are round-trippable and have well-defined XML infosets.) XOM manages your XML so you don't have to. With XOM, you can focus on the unique value of your application, and trust XOM to get the XML right.

XOM is fairly unique in that it is a dual streaming/tree-based API. Individual nodes in the tree can be processed while the document is still being built. The enables XOM programs to operate almost as fast as the underlying parser can supply data. You don't need to wait for the document to be completely parsed before you can start working with it.

XOM is very memory efficient. If you read an entire document into memory, XOM uses as little memory as possible. More importantly, XOM allows you to filter documents as they're built so you don't have to build the parts of the tree you aren't interested in. For instance, you can skip building text nodes that only represent boundary white space, if such white space is not significant in your application. You can even process a document piece by piece and throw away each piece when you're done with it. XOM has successfully processed gigabyte sized documents without breaking a sweat.

It's at least a year past when I hoped to release XOM, but the extra time has resulted in a much cleaner, more robust, faster API. XOM is now considered to be ready for production use. Future, post-1.0 releases should be backwards compatible with the 1.0 API for the foreseeable future.

If you'd like to know more about XOM, I suggest starting with the tutorial. XOM also includes a large collection of small sample programs that demonstrate various parts of the library. If you're curious about why XOM is the way it is, or if you would like to suggest future directions for XOM, you should read the design principles on which XOM is based. if you have a question about XOM that is not answered in the API documentation or the FAQ, you can ask it on the xom-interest mailing list. You do not need to be subscribed to post, but non-subscriber questions are moderated.

The Omni Group has released OmniWeb 5.1, a $29.95 payware web browser for Mac OS X. OmniWeb 5.x is based on the same KHTML engine safari uses so it supports the same basic functionality: XML, HTML, XHTML, and CSS but not XSLT. (I'm hoping we'll see XSLT functionality in Safari 2 next week at MacWorld, and maybe if we're really lucky it won't require a full OS upgrade, but don't hold your breath. Since Jobs returned, Apple has a horrible track record of supporting older OS releases.) Version 5.1 allegedly improves performance. However, the demo crashed on the second page I tried to load. The second time I launched it, it loaded fine. A nice feature is that the demo gives you 30 days to try it out, rather than 30 days from first launch. Handy if you just want to use it occasionally to test sites rather than as your main browser.

The W3C Privacy Activity has posted the fourth public working draft of the Platform for Privacy Preferences 1.1 (P3P1.1) Specification. "P3P 1.1 is based on the P3P 1.0 Recommendation and adds some features using the P3P 1.0 Extension mechanism. It also contains a new binding mechanism that can be used to bind policies for XML Applications beyond HTTP transactions." New features in P3P 1.1 include a mechanism to name and group statements together so user agents can organize the summary display of those policies and a generic means of binding P3P Policies to arbitrary XML to support XForms, WSDL, and other XML applications.

Wednesday, January 5, 2005

Dave Beckett has released the Raptor RDF Parser Toolkit 1.4.3, an open source C library for parsing the RDF/XML, N-Triples. Turtle, and Atom Resource Description Framework formats. It uses expat or libxml2 as the underlying XML parser. Version 1.4.3 adds an XML writer API, an RSS 1.0 serializer, and allows user namespace declarations and relative URIs. A new was added. Raptor is dual licensed under the LGPL and Apache 2.0 licenses.

Kiyut has released Sketsa 2.2.2, a $29 payware SVG editor written in Java. Java 1.4.1 or later is required. 2.2.2 is a bug fix release.

Tuesday, January 4, 2005

Sleepycat Software has released Berkeley DB XML 2.0.9, an open source "application-specific, embedded data manager for native XML data" based on Berkeley DB. It supports the July working drafts of XQuery 1.0 and XPath 2.0. It includes C++, Java, Perl, Python, TCL and PHP APIs. 2.0.9 is a bug fix release.

Monday, January 3, 2005

I've posted a new release candidate of XOM that includes new README and LICENSE files and an improved Ant build file that only compiles the servlet samples if the servlet classes are found somewhere in the classpath. The API and behavior is unchanged. If nobody spots any major problems in this release — what people have been finding lately have mostly been packaging issues — I'll probably release 1.0 in a few days.

Sunday, January 2, 2005

The W3C RDF Data Access Working Group has published the first public working draft of SPARQL Variable Binding Results XML Format. "This document defines a format for encoding variable binding results made by the SPARQL Query Language for RDF [SPARQL-QUERY] in XML".

Saturday, January 1, 2005

Opera Software has posted the first beta of version 8.0 of their namesake web browser for Windows. However, this appears to be what was previously called 7.6. Apparently, they decided the changes were significant enough to justify a major version number. There are lots of little changes, bug fixes, and usability enhancements in 8.0. However major new features include speech-enabled browsing (including support for XHTML+Voice), medium-screen rendering, and inline error pages. Opera supports HTML, XML, XHTML, RSS, WML 2.0, and CSS. XSLT is not supported. Other features include IRC, mail, and news clients and pop-up blocking. Opera is $39 payware.

News from 2004

News from 2003

News from 2002

News from 2001

News from 2000

News from 1998

News from 1999

Elliotte Rusty Harold

elharo@ibiblio.org