Chapter 7. The XMLReader Interface

Table of Contents

Building Parser Objects
Input
InputSource
EntityResolver
Exceptions and Errors
SAXExceptions
The ErrorHandler interface
Features and Properties
Getting and Setting Features
Getting and Setting Properties
Required Features
Standard Features
Standard Properties
Xerces Custom Features
Xerces Custom Properties
DTDHandler
Summary

The XML specification grants parsers a sometimes confusing amount of leeway in processing XML documents. Parsers are allowed to validate or not, resolve external entities or not, treat non-deterministic content models as errors or not, support non-standard encodings or not, check for namespace well-formedness or not, and much more. Depending on exactly which choices two parsers make for all these options, they can actually produce quite different pictures of the same XML document. Indeed, in a few cases one parser may even report a document to be well-formed while another reports that the document is malformed.

To support the wide range of capabilities of different parsers, the XMLReader interface that represents parsers in SAX is quite deliberately non-specific. It can be instantiated in a variety of different ways. It can read XML documents stored in a variety of media. It can be configured with features and properties both known and unknown. This chapter explores in detail the configuration and use of XMLReader objects.

Building Parser Objects

Since XMLReader is an interface, it has no constructors. Instead you use the static factory method XMLReaderFactory.createXMLReader() to retrieve an instance of XMLReader. In fact, there are two such methods in the XMLReaderFactory class:

public static XMLReader createXMLReader()
    throws SAXException;

public static XMLReader createXMLReader(String className)
    throws SAXException;

The first one returns the default XMLReader implementation class. This is specified by the org.xml.sax.driver Java system property. Parser vendors are supposed to modify this method to return an instance of their own parser in the event that this property is not set, though in practice few do this. Consequently when running a program that relies on XMLReaderFactory.createXMLReader() you may want to set the org.xml.sax.driver Java system property at the command line using the -D flag to the interpreter like this:

% java -Dorg.xml.sax.driver=com.fully.package.qualified.ParserClass
 MainClass

If there are multiple versions of the SAX classes in your class path, then whichever one the virtual machine finds first gets to choose which XMLReader implementation class to give you. However, if you know you want a specific class (e.g. org.apache.xerces.parsers.SAXParser) then you can ask for it by fully package-qualified name using the second XMLReaderFactory.createXMLReader() method. For example, this code asks for the Xerces parser by name:

XMLReader parser = XMLReaderFactory.createXMLReader(
 "org.apache.xerces.parsers.SAXParser"
);

If the class you request can’t be located, createXMLReader() throws a SAXException. Since there’s no guarantee that any particular parser is installed on any given system where your code may run, you should be prepared to catch and respond to this. Normally the correct response is to fall back to the default parser, like this:

XMLReader parser;
  try {
    parser = XMLReaderFactory.createXMLReader(
     "org.apache.xerces.parsers.SAXParser"
    );
  }
  catch (SAXException e) {
    try {
      parser = XMLReaderFactory.createXMLReader();
    }
    catch (SAXException e) {
      throw new NoClassDefFoundError("No SAX parser is available");
      // or whatever exception your method is declared to throw
    }
  }

Alternately, you can try multiple known parser classes until you find one that’s available. This code searches for several of the major parsers in my personal order of preference, only falling back on the default parser if none of these can be found:

  XMLReader parser;
          
  try { // Xerces
    parser = XMLReaderFactory.createXMLReader(
     "org.apache.xerces.parsers.SAXParser"
    );
  }
  catch (SAXException e1) {
    try { // Crimson
      parser = XMLReaderFactory.createXMLReader(
       "org.apache.crimson.parser.XMLReaderImpl"
      );
    }
    catch (SAXException e2) { 
      try { // Ælfred
        parser = XMLReaderFactory.createXMLReader(
         "gnu.xml.aelfred2.XmlReader"
        );
      }
      catch (SAXException e3) {
        try { // Piccolo
          parser = XMLReaderFactory.createXMLReader(
            "com.bluecast.xml.Piccolo"
          );
        }
        catch (SAXException e4) {
          try { // Oracle
            parser = XMLReaderFactory.createXMLReader(
              "oracle.xml.parser.v2.SAXParser"
            );
          }
          catch (SAXException e5) {
            try { // default
              parser = XMLReaderFactory.createXMLReader();
            }
            catch (SAXException e6) {
              throw new NoClassDefFoundError(
                "No SAX parser is available");
                // or whatever exception your method is  
                // declared to throw
            }
          }
        }
      }
    } 
  }

I use this technique in my working code; and you’re more than welcome to copy it. However, because it’s quite long and repetitive, I’ll mostly stick to one named parser and a fallback to the default in the examples in this book.

I also occasionally see programs that use a constructor to retrieve an instance of a particular class. For example,

XMLReader parser = new SAXParser()

Or, worse yet,

SAXParser parser = new SAXParser()

This doesn’t let you do anything you can’t do with XMLReaderFactory.createXMLReader(). However, it does tie your code tightly to one particular parser and makes it a little more difficult to change parsers at a later date. At an absolute minimum, swapping in a different parser will require an edit and a recompile. However, if you use XMLReaderFactory.createXMLReader() instead, you can change parsers without even having access to the source code.


Copyright 2001, 2002 Elliotte Rusty Haroldelharo@metalab.unc.eduLast Modified November 11, 2001
Up To Cafe con Leche