Reading XML Documents with JDOM

Naturally, JDOM can read existing XML documents from files, network sockets, strings, or anything else you can hook a stream or reader to. JDOM does not, however, include its own native parser. Instead it relies on any of a number of very fast, well-tested SAX2 parsers such as Xerces and Crimson.

The rough outline for working with an existing XML document using JDOM is as follows:

  1. Construct an org.jdom.input.SAXBuilder object using a simple no-args constructor

  2. Invoke the builder’s build() method to build a Document object from a Reader, InputStream, URL, File, or a String containing a system ID.

  3. If there’s a problem reading the document, an IOException is thrown. If there’s a problem building the document, a JDOMException is thrown.

  4. Otherwise, navigate the document using the methods of the Document class, the Element class, and the other JDOM classes.

The SAXBuilder class represents the underlying XML parser. Parsing a document from a URL is straightforward. Just create a SAXBuilder object with the no-args constructor and pass the string form of the URL to its build() method. This returns a JDOM Document object. For example,

 SAXBuilder parser = new SAXBuilder();
 Document doc = parser.build("http://www.cafeconleche.org/");
 // work with the document...

That’s all there is to it. If you prefer, you can build the Document from a java.io.File, a java.net.URL, a java.io.InputStream, a java.io.Reader, or an org.xml.sax.InputSource.

The build() method throws an IOException if an I/O error such as a broken socket prevents the document from being completely read. It throws a JDOMException if the document is malformed. This is the generic superclass for most things that can go wrong while working with JDOM other than I/O errors. Example 14.7 demonstrates a simple program that checks XML documents for well-formedness by looking for these exceptions.

Example 14.7. A JDOM program that checks XML documents for well-formedness

import org.jdom.JDOMException;
import org.jdom.input.SAXBuilder;
import java.io.IOException;


public class JDOMChecker {

  public static void main(String[] args) {
  
    if (args.length == 0) {
      System.out.println("Usage: java JDOMChecker URL"); 
      return;
    } 
      
    SAXBuilder builder = new SAXBuilder();
     
    // command line should offer URIs or file names
    try {
      builder.build(args[0]);
      // If there are no well-formedness errors, 
      // then no exception is thrown
      System.out.println(args[0] + " is well-formed.");
    }
    // indicates a well-formedness error
    catch (JDOMException e) { 
      System.out.println(args[0] + " is not well-formed.");
      System.out.println(e.getMessage());
    }  
    catch (IOException e) { 
      System.out.println("Could not check " + args[0]);
      System.out.println(" because " + e.getMessage());
    }  
  
  }

}

I used this program to test my Cafe con Leche web site for well-formedness. It’s supposed to be well-formed XML, but I’m often sloppy. The results were informative:

D:\books\XMLJAVA\examples\14>java JDOMChecker http://www.cafeconleche.org/
http://www.cafeconleche.org is not well formed.
Error on line 351 of document http://www.cafeconleche.org: The 
element type "img" must be terminated by the matching end-tag 
"</img>".

I fixed the problem. However, JDOM only reports the first error in a document, so it’s not surprising that running the program again uncovered a second problem:

D:\books\XMLJAVA\examples\14>java JDOMChecker http://www.cafeconleche.org/
http://www.cafeconleche.org is not well formed.
Error on line 363 of document http://www.cafeconleche.org: The 
element type "input" must be terminated by the matching end-tag 
"</input>".

Several more problems were encountered in order. Once I fixed the last one, everything finally checked out:

D:\books\XMLJAVA\examples\14>java JDOMChecker http://www.cafeconleche.org/
http://www.cafeconleche.org is well formed.

Exactly which SAX parser JDOM uses to build documents depends on the local environment. By default, JDOM relies on JAXP to choose the parser class. If that fails, it picks Xerces. However, if you really care which parser is used, specify the fully package qualified name of the XMLReader class you want as the first argument to the constructor. For example, this sets the parser as Crimson:

 SAXBuilder parser 
  = new SAXBuilder("org.apache.crimson.parser.XMLReaderImpl");
 Document doc = parser.build("http://www.cafeconleche.org/");
 // work with the document...

By default, SAXBuilder only checks documents for well-formedness, not validity. If you want to validate as well, then pass the boolean true to the SAXBuilder() constructor. Then any validity errors will also cause JDOMExceptions. Example 14.8 demonstrates with a simple validation program.

Example 14.8. A JDOM program that validates XML documents

import org.jdom.JDOMException;
import org.jdom.input.SAXBuilder;
import java.io.IOException;


public class JDOMValidator {

  public static void main(String[] args) {
  
    if (args.length == 0) {
      System.out.println("Usage: java JDOMValidator URL"); 
      return;
    } 
      
    SAXBuilder builder = new SAXBuilder(true);
                                    //  ^^^^
                                    // Turn on validation
     
    // command line should offer URIs or file names
    try {
      builder.build(args[0]);
      // If there are no well-formedness or validity errors, 
      // then no exception is thrown.
      System.out.println(args[0] + " is valid.");
    }
    // indicates a well-formedness or validity error
    catch (JDOMException e) { 
      System.out.println(args[0] + " is not valid.");
      System.out.println(e.getMessage());
    }  
    catch (IOException e) { 
      System.out.println("Could not check " + args[0]);
      System.out.println(" because " + e.getMessage());
    }  
  
  }

}

Here are the results from running this program across two documents, the first invalid (because it doesn’t even have a DTD) and the second valid:

D:\books\XMLJAVA\examples\14>java JDOMValidator http://cafeconleche.org/
http://cafeconleche.org is not valid.
Error on line 1 of document http://cafeconleche.org: Document root 
element "html", must match DOCTYPE root "null".
D:\books\XMLJAVA\examples\14>java JDOMValidator 
http://www.w3.org/TR/2000/REC-xml-20001006.html
http://www.w3.org/TR/2000/REC-xml-20001006.html is well formed.

This does assume that the default parser JDOM picks can in fact validate, which is true of most modern parsers you’re likely to encounter. However, if you really want to make sure you could always ask for a known validating parser by name. For example, this requests the Xerces SAXParser:

SAXBuilder parser 
 = new SAXBuilder("org.apache.xerces.parsers.SAXParser", true);

Note

JDOM does not currently distinguish between validity and well-formedness errors. I’m working on a patch for this. Of course, any malformed document is de facto invalid.


Copyright 2001, 2002 Elliotte Rusty Haroldelharo@metalab.unc.eduLast Modified July 29, 2002
Up To Cafe con Leche