XML News from Saturday, July 1, 2006

John Cowan has released TagSoup 1.0.1, an open source, Java-language, SAX parser for nasty, ugly HTML. According to Cowan, "Previous versions of TagSoup always ignored whitespace in elements that don't have PCDATA as a possible child. Now, if you turn on the ignorableWhitespaceFeature (or use the --ignorable option), that whitespace will be returned to your application through the previously unused ContentHandler.ignorableWhitespace callback. This isn't done by default for backwards compatibility, and also because HTML is an SGML application and SGML parsers routinely dropped such whitespace." This release also fixes a couple of bugs where TagSoup could report malforemd comments and public identifiers.