Chapter 6. SAX

Table of Contents

What is SAX?
Parsing
Callback Interfaces
Implementing ContentHandler
Using the ContentHandler
The DefaultHandler Adapter Class
Receiving Documents
Receiving Elements
Handling Attributes
Receiving Characters
Receiving Processing Instructions
Receiving Namespace Mappings
Ignorable White Space
Receiving Skipped Entities
Receiving Locators
What the ContentHandler Doesn’t Tell You
Summary

At its core, SAX, the Simple API for XML, is based on just two interfaces, the XMLReader interface that represents the parser and the ContentHandler interface that receives data from the parser. These two interfaces alone suffice for 90% of what you need to do with SAX. This chapter shows the basic operation of XMLReader and discusses ContentHandler in detail. The next chapter explores a variety of ways to customize the parsing process through the more advanced features of the XMLReader interface.

What is SAX?

The Simple API for XML, SAX, was invented in late 1997/early 1998 when Peter Murray-Rust and several authors of XML parsers written in Java decided there wasn’t much point to maintaining multiple similar yet incompatible APIs to do exactly the same thing. Murray-Rust was the first to suggest what he called “YAXPAPI”. The reason Murray-Rust wanted Yet Another XML Parser API was that he was thoroughly sick of supporting multiple, incompatible XML parsers for his parser-client application JUMBO. Instead, he wanted a standard API everyone could agree on. Parser authors Tim Bray and David Megginson quickly signed on to the project, and work began in public on the xml-dev mailing list where many people participated. Megginson wrote the initial draft of SAX. After a short beta period, SAX 1.0 was released on May 11, 1998.

SAX was designed around abstract interfaces rather than concrete classes so it could be layered on top of parsers’ existing native APIs. SAX is not the most sophisticated XML API imaginable, but that’s part of its beauty. The ease with which SAX could be implemented by many parser vendors with very different architectures contributed to its success and rapid standardization.

Although SAX is very much a de facto standard, it has not gone through any formal standardization process. Its development was open to anyone interested. All you had to do was join the xml-dev mailing list and participate in the discussions. The end result was explicitly placed in the public domain. It is free to be implemented or extended by anyone for any purpose without permission from anybody. It is not copyrighted or trademarked. As far as is known, no parts of it are patented by anyone either.

In late 1999, work began on SAX2. This was a radical reformulation of SAX that, while maintaining the same basic event-oriented architecture, replaced almost every class in SAX1. The main impetus for this radical shift was the need to make SAX namespace aware. However many other new capabilities were added in SAX2 including filters and optional support for lexical events and DTDs. SAX2 was finished in May 2000, and has proven even more successful than SAX1. Indeed SAX2 is the most complete XML API available anywhere. In 2002, all major parsers that support SAX at all support SAX2. There is no reason to learn or concern yourself with the older classes and interfaces from SAX1, and henceforth I will discuss SAX2 exclusively.

For the first few years of its life, the official SAX distribution and documentation was maintained by David Megginson. However, he recently passed the torch to David Brownell who has begun work on SAX 2.1. At the time of this writing, SAX 2.1 seems unlikely to be as radical a shift relative to SAX2 as SAX2 was relative to SAX1. Version 2.1 will add a few bits of information from the XML document that are not exposed by SAX2 such as the encoding declaration. However, no SAX2 classes, interfaces, or methods will be deprecated in SAX 2.1; and only programmers with very special needs will need to concern themselves with the new functionality in SAX 2.1.


Copyright 2001, 2002 Elliotte Rusty Haroldelharo@metalab.unc.eduLast Modified May 26, 2002
Up To Cafe con Leche