The Entity Interface

The Entity interface represents a parsed or unparsed general entity declared in a document’s DTD. (DOM does not expose parameter entities.) A map of the entities declared in a document is available from the getEntities() method of the DocumentType interface. However, entities are not part of the tree structure. The parent of an entity is always null.

An Entity object represents the actual storage unit. It does not represent the entity reference such as Ω or &copyright; that appears in the instance document, but rather the replacement text that reference points to. For parsed entities that the XML parser has resolved, the descendants of the Entity object form a read-only tree containing the XML markup the entity reference stands for. For unparsed entities and external entities that the XML parser has not read, the Entity object has no children.

Example 11.22 summarizes the Entity interface. This interface includes methods to get the public ID, system ID, and notation name for the entity. These methods all return null if that property is not applicable to this entity. To get the replacement text of an entity use the methods Entity inherits from its Node super-interface such as hasChildNodes() and getFirstChild().

Example 11.22. The Entity interface

package org.w3c.dom;

public interface Entity extends Node {

  public String getPublicId();
  public String getSystemId();
  public String getNotationName();

}

For an example, let’s look at a program that walks the document looking for entity references. Every time it sees one it prints out that reference’s name, public ID, and system ID. To do this, it has to look up the entity reference’s name in the entities map returned by the getEntities() of the DocumentType interface. A java.util.Set keeps track of which entities have been printed so it won’t print any entity more than once.

Example 11.23. Listing parsed entities used in the document

import javax.xml.parsers.*;
import org.w3c.dom.*;
import org.xml.sax.SAXException;
import java.io.IOException;
import java.util.*;


public class EntityLister {

  // Store the entities that have already been printed
  private Set          printed = new HashSet();
  private NamedNodeMap entities;
  
  // Recursively descend the tree
  public void printEntities(Document doc) {
    
    DocumentType doctype = doc.getDoctype();
    entities = doctype.getEntities();
    seekEntities(doc);
    
  }

  // note use of recursion
  private void seekEntities(Node node) {
    
    int type = node.getNodeType();
    if (type == Node.ENTITY_REFERENCE_NODE) {
      EntityReference ref = (EntityReference) node;
      printEntityReference(ref);
    }
    
    if (node.hasChildNodes()) {
      NodeList children = node.getChildNodes();
      for (int i = 0; i < children.getLength(); i++) {
        seekEntities(children.item(i));
      } 
    }
    
  }  
  
  private void printEntityReference(EntityReference ref) {
    
    String name = ref.getNodeName();
    if (!printed.contains(name)) {
      
      Entity entity   = (Entity) entities.getNamedItem(name);
      String publicID = entity.getPublicId();
      String systemID = entity.getSystemId();

      System.out.print(name + ": ");
      if (publicID != null) System.out.print(publicID + " ");
      if (systemID != null) System.out.print(systemID + " ");
      else { // Internal entities do not have system IDs
        System.out.print("internal entity");
      }
      System.out.println();
      
      printed.add(name);
    }
    
  }
  
  public static void main(String[] args) {

    if (args.length <= 0) {
      System.out.println("Usage: java EntityLister URL");
      return;
    }
    String url = args[0];
    
    try {
      DocumentBuilderFactory factory 
       = DocumentBuilderFactory.newInstance();
       
      // By default JAXP does not include entity reference nodes
      // in the tree. You have to explicitly request them by 
      // telling DocumentBuilderFactory not to expand entity
      // references.
      factory.setExpandEntityReferences(false);
      DocumentBuilder parser = factory.newDocumentBuilder();
      
      // Read the document
      Document document = parser.parse(url); 
     
      // Print the entities
      EntityLister lister = new EntityLister();
      lister.printEntities(document);

    }
    catch (SAXException e) {
      System.out.println(url + " is not well-formed.");
    }
    catch (IOException e) { 
      System.out.println(
       "Due to an IOException, the parser could not read " + url
      ); 
    }
    catch (FactoryConfigurationError e) { 
      System.out.println("Could not locate a factory class"); 
    }
    catch (ParserConfigurationException e) { 
      System.out.println("Could not locate a JAXP parser"); 
    } 
     
  } // end main
  
}

Mostly this is fairly straightforward tree-walking code of the sort you’ve seen several times before. However, you should note that by default JAXP DocumentBuilder objects do not put any entity reference nodes in the trees they build. To get these, the expand entity references property must be explicitly set to false on the DocumentBuilderFactory that creates the DocumentBuilder.

Here’s the output when I ran this across the DocBook source for this chapter. All the entity references used here are internally defined references to single hard-to-type characters like curly quotes and the em dash.

D:\books\XMLJAVA>java EntityLister file://D/books/XMLJava/ch11.xml
rsquo: internal entity
mdash: internal entity
hellip: internal entity

Copyright 2001, 2002 Elliotte Rusty Haroldelharo@metalab.unc.eduLast Modified January 10, 2002
Up To Cafe con Leche