The EntityReference Interface

The EntityReference interface represents a general entity reference such as   or &copyright_notice;. (It is not used for the five predefined entity references &, <, >, ', and ".)

Example 11.13 summarizes the EntityReference interface. You’ll notice it declares exactly zero methods of its own. All of its functionality is inherited from the Node super-interface. In an XML document, an entity reference is just a place holder for the text the entity reference will be replaced by. In a DOM tree, an EntityReference object merely contains the things the entity reference will be replaced by.

Example 11.13. The EntityReference interface

package org.w3c.dom;

public interface EntityReference extends Node {

}

The name of the entity reference is returned by the getNodeName() method. The replacement text for the entity (assuming the parser has resolved the entity) can be read through the usual methods of the Node interface. like getFirstChild(). However, entity references are read-only. You cannot change their children using methods like appendChild() or replaceChild() or change their names using methods like setNodeName(). Trying to do so throws a DOMException with the error code NO_MODIFICATION_ALLOWED_ERR.

EntityReference objects do not know their own system ID (URL) or public ID. However using the entity reference’s name, you can look up this information in the NamedNodeMap of Entity objects returned by the getEntities() method of the DocumentType class. I’ll show you an example of this when we get to the Entity interface. In the meantime, let’s do an example that creates new entity references in the tree.

One common complaint about XML is that it doesn’t support the entity references like   and é developers are accustomed to from HTML. Using DOM, it’s straightforward to replace any inconvenient character with an entity reference as Example 11.14 proves. This program recursively descends the element tree looking for any non-breaking space characters (Unicode code point 0xA0). It replaces any it finds with an entity reference with the name nbsp. To do so it has to split the text node around the non-breaking space.

Example 11.14. Inserting entity references into a document

import org.w3c.dom.*;


public class NBSPUtility {
  
  // Recursively descend the tree replacing all non-breaking
  // spaces with  
  public static void addEntityReferences(Node node) {
    
    int type = node.getNodeType();
    if (type == Node.TEXT_NODE) { 
                // the only type with attributes
      Text text = (Text) node;
      String s = text.getNodeValue();
      int nbsp = s.indexOf('\u00A0'); // finds the first A0
      if (nbsp != -1) {
        Text middle = text.splitText(nbsp);
        Text end = middle.splitText(1);
        Node parent = text.getParentNode();
        Document factory = text.getOwnerDocument();
        EntityReference ref = factory.createEntityReference("nbsp");
        parent.replaceChild(ref, middle);
        addEntityReferences(end); // finds any subsequent A0s
        System.out.println("Added");
      }
    } // end if 
    
    else if (node.hasChildNodes()) {
      NodeList children = node.getChildNodes();
      for (int i = 0; i < children.getLength(); i++) {
        Node child = children.item(i);
        addEntityReferences(child);
      } // end for
    } // end if 
    
  }  // end addEntityReferences()
  
}

It would be easy enough to make it replace all the Latin-1 characters, or all the characters that have standard entity references in HTML, or some such. You’d just need to keep a table of the characters and their corresponding entity references. You could even build such a table from the entities map available from the DTD.

Although this code runs, the documents it produces are not necessarily well-formed. In particular, only entities defined in the DTD should be used. Assuming that’s the case, then the child list of the entity will be automatically filled by the entity’s replacement text. Unfortunately, however, DOM does not offer any means of defining new entities that are not part of the document’s original DTD.


Copyright 2001, 2002 Elliotte Rusty Haroldelharo@metalab.unc.eduLast Modified January 04, 2002
Up To Cafe con Leche