Receiving Locators

For debugging purposes, it’s often useful to know exactly where a particular element or other item appears. To provide this information, parsers should (but are not required to) implement the Locator interface. A Locator object knows at which point in which file the latest event was fired.

Example 6.15 summarizes the Locator interface. You see that it offers both public and system identifiers for the entity in which the start-tag/end-tag/processing instruction/skipped entity/etc. was found. Furthermore it tells you approximately at which line and column in that line the item begins. Lines and columns both begin at 1. If this information isn’t available for some reason (most commonly a well-formedness error very early in the document) these methods normally return -1.

Example 6.15. The SAX Locator interface

package org.xml.sax;


public interface Locator {
        
  public String getPublicId();
  public String getSystemId();
  public int    getLineNumber();
  public int    getColumnNumber();
    
}

The public identifier is normally a string like “-//OASIS//DTD DocBook XML V4.1.2//EN”. However, it’s often absent in which case getPublicId() returns null instead. The system identifier is almost always an absolute or relative URL if it’s known at all. However in some contexts no URL may be available. For instance, the XMLReader may be receiving its events from a JDOM SAXOutputter object or a stream of unknown origin rather than directly from a file or a network connection. In this case, getSystemId() may return null too. Since XML documents can be divided across multiple files, it’s possible that different items may appear at different URLs, though well-formedness guarantees that every start-tag has the same URL as its corresponding end-tag.

If the parser does provide location information, then it will invoke the ContentHandler’s setDocumentLocator() method before calling startDocument(). If you want location information to be available later, then you need to store a reference to this object somewhere, typically in a field. For example,

  private Locator locator;
  
  public void setDocumentLocator(Locator locator) {
    this.locator = locator;
  }

If the ContentHandler is used to parse multiple documents, you’ll receive a new Locator object for each document. That Locator is only good while its document is being parsed. Once the endDocument() method has been invoked, the Locator may not return sensible results.

Example 6.16 demoes this interface with a program that prints out line and column numbers for each method invocation. To run this program you need a parser that does provide location information; but most, including Xerces, do provide this.

Example 6.16. Determining the locations of events

import org.xml.sax.*;
import org.xml.sax.helpers.XMLReaderFactory;


public class LocatorDemo implements ContentHandler {

  private Locator locator;
  
  public void setDocumentLocator(Locator locator) {
    this.locator = locator;
  }
  
  private void printLocation(String s) { 
    
    int line = locator.getLineNumber();
    int column = locator.getColumnNumber();
    System.out.println(
     s + " at line " + line + "; column " + column
    );
    
  }

  public void startDocument() { 
    printLocation("startDocument()");
  }

  public void endDocument() { 
    printLocation("endDocument()");
  }

  public void startElement(String namespaceURI, String localName, 
   String qualifiedName, Attributes atts) {
    printLocation("startElement()");
  }

  public void endElement(String namespaceURI, String localName, 
  String qualifiedName) {
    printLocation("endElement()");
  }
  
  public void characters(char[] text, int start, int length) {
    printLocation("characters()"); 
  }  
  
  public void startPrefixMapping(String prefix, String uri) {
    printLocation("startPrefixMapping()"); 
  }
  
  public void endPrefixMapping(String prefix) {
    printLocation("endPrefixMapping()"); 
  }
  
  public void ignorableWhitespace(char[] text, int start, 
   int length) {
    printLocation("ignorableWhitespace()");  
  }
  
  public void processingInstruction(String target, String data) {
    printLocation("processingInstruction()");  
  }
  
  public void skippedEntity(String name) {
    printLocation("skippedEntity()");     
  }  
  
  public static void main(String[] args) {
    
    if (args.length == 0) {
      System.out.println("Usage: java SAXSpider URL1"); 
    } 
    String uri = args[0];
    
    try {
      XMLReader parser = XMLReaderFactory.createXMLReader();
      
      // Install the ContentHandler   
      ContentHandler handler = new LocatorDemo();   
      parser.setContentHandler(handler);
      parser.parse(uri);

    }
    catch (Exception e) {
      System.err.println(e);
    }
        
  } // end main
   
} // end LocatorDemo

Here’s the output when Example 6.13 is fed into this program:

C:\XMLJAVA>java LocatorDemo SymbolLookup.xml
startDocument() at line 1; column 1
startElement() at line 10; column 13
ignorableWhitespace() at line 11; column 3
startElement() at line 11; column 15
characters() at line 11; column 27
endElement() at line 11; column 41
ignorableWhitespace() at line 12; column 3
startElement() at line 12; column 11
ignorableWhitespace() at line 13; column 5
startElement() at line 13; column 12
ignorableWhitespace() at line 14; column 7
startElement() at line 14; column 14
ignorableWhitespace() at line 15; column 9
startElement() at line 15; column 17
characters() at line 17; column 9
endElement() at line 17; column 19
ignorableWhitespace() at line 18; column 7
endElement() at line 18; column 16
ignorableWhitespace() at line 19; column 5
endElement() at line 19; column 14
ignorableWhitespace() at line 20; column 3
endElement() at line 20; column 13
ignorableWhitespace() at line 21; column 1
endElement() at line 21; column 15
endDocument() at line -1; column -1

Copyright 2001, 2002 Elliotte Rusty Haroldelharo@metalab.unc.eduLast Modified October 16, 2001
Up To Cafe con Leche