XML News from Sunday, May 21, 2006

John Cowan has posted the sixth release candidate of TagSoup, an open source, Java-language, SAX parser for nasty, ugly HTML. I use TagSoup to convert JavaDoc to well-formed XHTML. RC6 focuses on namespaces. According to Cowan,

This release fixed a bunch of bugs around namespaces. The SAX spec was a little hard to follow, so I am now doing a subset of what Xerces does, in hopes that that will be compatible with what most SAX applications expect. In particular, the namespace-prefixes feature is now false by default, as it should be, and cannot be made true. It used to be true by default, but did not meet the contract that implies, namely that xmlns: attributes would be provided to the application.

This also involved fixing a bug in XMLWriter that made it work incorrectly when the namespaces feature is false. In addition, most people don't want namespaces in HTML mode, so --html now implies --nons. To get the namespaces back, use --no-xml-declaration --method=html instead.

TagSoup is dual licensed under the Academic Free License and the GPL.