DOM

The Document Object Model defines a tree-based representation of XML documents. The org.w3c.dom package contains the basic node classes that represent the different components that make up the tree. The org.w3c.dom.traversal package includes some useful utility classes for navigating, searching, and querying the tree.

DOM Level 2, the version which is described here, is incomplete. It does not define how a DOMImplementation is loaded, how a document is parsed, or how a document is serialized. For the moment, JAXP provides a stopgap solution. Eventually, DOM Level 3 will fill in these holes. However, since DOM Level 3 was far from complete at the time of this writing, this appendix covers DOM level 2 exclusively.

The DOM Data Model

Table A.1 summarizes the DOM data model with the name, value, parent, and possible children for each kind of node.

Table A.1. Node properties

Node typenamevalueparentchildren
Document#documentnullnullComment, processing instruction, zero or one document type, one element
DocumentTypeRoot element name specified by the DOCTYPE declarationnullDocumentnone
Elementprefixed namenullElement, Document, or Document fragmentComment, Processing Instruction, Text, Element, Entity reference, CDATA section
Text#texttext of the nodeElement, Attr, Entity, or Entity referencenone
Attrprefixed namenormalized attribute valueElementText, Entity reference
Comment#commenttext of commentElement, Document, or Document fragmentnone
Processing InstructiontargetdataElement, Document, or Document fragmentnone
Entity ReferencenamenullElement or Document FragmentComment, Processing Instruction, Text, Element, Entity reference, CDATA section
Entityentity namenullnullComment, Processing Instruction, Text, Element, Entity Reference, CDATA section
CDATA section#cdata-sectiontext of the sectionElement, Entity, or Entity referencenone
Notationnotation namenullnullnone
Document fragment#document-fragmentnullnullComment, Processing Instruction, Text, Element, Entity reference, CDATA section

One thing to keep in mind is the parts of the XML document that are not exposed in this data model:

  • The XML declaration, including the version, standalone, and encoding declarations. These will be added as properties of the document node in DOM3, but they are not provided by current parsers.

  • Most information from the DTD and/or schema is not provided including element and attribute types and content models. DOM Level 3 will add some of this.

  • Any white space outside the root element.

  • Whether or not each character was provided by a character reference. Parsers may provide information about entity references, but are not required to do so.

A DOM program cannot manipulate any of these constructs. It cannot, for example, read in an XML document, then write it out again in the same encoding the original document used because it doesn’t know what encoding the original document used. It cannot treat $var differently than $var because it doesn’t know which was originally written.

org.w3c.dom

The org.w3c.dom package contains the core interfaces that are used to form DOM documents. Node is the common superinterface all these node types share. In addition, this package contains a few data structures used to hold collections of DOM nodes and one exception class.

Attr

The Attr interface represents an attribute node. Its node properties are defined as follows:

node nameThe full name of the attribute, including a prefix and a colon if the attribute is in a namespace
node valueThe attribute’s normalized value
local nameThe local part of the attribute’s name
namespace URIThe namespace URI of the attribute or null if the attribute does not have a prefix
namespace prefixThe namespace URI of the attribute or null if the attribute is not in a namespace

Furthermore, Attr objects are not part of the tree. They have neither parents nor siblings. getParentNode(), getPreviousSibling(), and getNextSibling() all return null when invoked on an Attr object. Attr objects do have children (Text and EntityReference objects) but it's generally best to ignore this fact, and just use the getValue() method to read the value of an attribute.

package org.w3c.dom;

public interface Attr extends Node {

  public String  getName();
  public boolean getSpecified();
  public String  getValue();
  public void    setValue(String value) throws DOMException;
  public Element getOwnerElement();

}

CDATASection

The CDATASection interface represents a CDATA section. DOM parsers are not required to use this interface to report CDATA sections. They may just use Text objects to report the content of CDATA sections. Do not write code that depends on recognizing CDATA sections in text. The node properties of CDATASection are defined as follows:

node name#cdata-section
node valueThe text of the CDATA section
local namenull
namespace URInull
namespace prefixnull
package org.w3c.dom;

public interface CDATASection extends Text {

}

CharacterData

The CharacterData interface is the generic superinterface for those nodes composed of plain text: Comment, Text, and CDATASection. All actual instances of CharacterData should be instances of one of these subinterfaces. The node properties depend on the specific subinterface.

package org.w3c.dom;

public interface CharacterData extends Node {

  public String getData() throws DOMException;
  public void   setData(String data) throws DOMException;
  public int    getLength();
  public String substringData(int offset, int count) 
   throws DOMException;
  public void   appendData(String s) throws DOMException;
  public void   insertData(int offset, String s) 
   throws DOMException;
  public void   deleteData(int offset, int count) 
   throws DOMException;
  public void   replaceData(int offset, int count, String s) 
   throws DOMException;

}

Comment

The Comment interface represents a comment node. All its methods are inherited from the CharacterData and Node superinterfaces. Its node properties are defined as follows:

node name#comment
node valueThe text of the comment, not including <-- and -->
local namenull
namespace URInull
namespace prefixnull
package org.w3c.dom;

public interface Comment extends CharacterData {

}

Document

The Document interface represents the root node of the tree. It also serves as an abstract factory to create the other kinds of nodes (element, attribute, comment, etc.) which will be stored in the tree. Its node properties are defined as follows:

node name#document
node valuenull
local namenull
namespace URInull
namespace prefixnull
package org.w3c.dom;

public interface Document extends Node {

  public DocumentType          getDoctype();
  public DOMImplementation     getImplementation();
  public Element               getDocumentElement();
  
  public Element               createElement(String tagName) 
   throws DOMException;
  public Element               createElementNS(
   String namespaceURI, String qualifiedName) throws DOMException;
  public Attr                  createAttribute(String name) 
   throws DOMException;
  public Attr                  createAttributeNS(
   String namespaceURI, String qualifiedName) throws DOMException;
  public DocumentFragment      createDocumentFragment();
  public Text                  createTextNode(String data);
  public Comment               createComment(String data);
  public CDATASection          createCDATASection(String data) 
   throws DOMException;
  public ProcessingInstruction createProcessingInstruction(
   String target, String data) throws DOMException;
  public EntityReference       createEntityReference(String name) 
   throws DOMException;

  public NodeList getElementsByTagName(String tagName);
  public Node     importNode(Node importedNode, boolean deep) 
   throws DOMException;
  public NodeList getElementsByTagNameNS(String namespaceURI, 
   String localName);
  public Element  getElementById(String id);

}

DocumentFragment

The DocumentFragment interface is used to hold lists of element, text, comment, CDATA section, and processing instruction nodes when those nodes do not have a parent. It’s convenient for cutting and pasting or inserting and moving fragments of an XML document that that don’t necessarily contain a single element. Its node properties are defined as follows:

node name#document-fragment
node valuenull
local namenull
namespace URInull
namespace prefixnull
package org.w3c.dom;

public interface DocumentFragment extends Node {

}

This interface is for advanced use only. DOM trees created by a parser won’t contain any DocumentFragment objects and adding a DocumentFragment to a Document actually adds the contents of the fragment instead.

DocumentType

The DocumentType interface represents a document type declaration. It contains the root element name it declares, the system ID and public ID for the external DTD subset, and the complete internal DTD subset as a String. It also contains lists of the notations and general entities declared in the DTD. Other than this it contains no information from the DTD. The node properties of a DocumentType object are defined as follows:

node namedeclared root element name
node valuenull
local namenull
namespace URInull
namespace prefixnull
package org.w3c.dom;

public interface DocumentType extends Node {

  public String getName();
  public String getPublicId();
  public String getSystemId();
  public String getInternalSubset();

  public NamedNodeMap getEntities();
  public NamedNodeMap getNotations();

  
}

In DOM Level 2 the entire DocumentType object is read-only. No part of it can be modified. Furthermore a Document object’s DocumentType cannot be changed after the Document object is created. This restriction is lifted in DOM Level 3.

DOM does not provide any representation of the document type definition as distinguished from the document type declaration.

DOMImplementation

DOMImplementation is an abstract factory used to create new Document and DocumentType objects. The javax.xml.parsers.DocumentBuilder class can create new DOMImplementation objects.

package org.w3c.dom;

public interface DOMImplementation  {

  public DocumentType createDocumentType(String qualifiedName, 
   String publicID, String systemID) throws DOMException;
  public Document     createDocument(String namespaceURI, 
   String qualifiedName, DocumentType doctype) throws DOMException;

  public boolean hasFeature(String feature, String version);

}

Element

The Element interface represents an element node. The most important methods for this interface are inherited from the Node superinterface. Its node properties are defined as follows:

node nameThe qualified name of the element, possibly including a prefix and a colon
node valuenull
local nameThe local part of the element name
namespace URIThe namespace URI of the element, or null if this element is not in a namespace
namespace prefixThe namespace prefix of the element, or null if this element is in the default namespace or no namespace at all
package org.w3c.dom;

public interface Element extends Node {

  public String   getTagName();
  public NodeList getElementsByTagNameNS(String namespaceURI, 
   String localName);
  public NodeList getElementsByTagName(String name);

  public String  getAttribute(String name);
  public void    setAttribute(String name, String value) 
   throws DOMException;
  public void    removeAttribute(String name) 
   throws DOMException;
  public Attr    getAttributeNode(String name);
  public Attr    setAttributeNode(Attr newAttr) 
   throws DOMException;
  public Attr    removeAttributeNode(Attr oldAttr) 
   throws DOMException;
  public String  getAttributeNS(String namespaceURI, 
   String localName);
  public void    setAttributeNS(String namespaceURI, 
   String qualifiedName, String value) throws DOMException;
  public void    removeAttributeNS(String namespaceURI, 
   String localName) throws DOMException;
  public Attr    getAttributeNodeNS(String namespaceURI, 
   String localName);
  public Attr    setAttributeNodeNS(Attr newAttr) 
   throws DOMException;
  public boolean hasAttribute(String name);
  public boolean hasAttributeNS(String namespaceURI, 
   String localName);

}

Entity

The Entity interface represents an entity node. It does not appear directly in the tree. Instead an EntityReference node appears in the tree. The name of the EntityReference identifies a member of the document’s entities map, which is accessible through the DocumentType interface. If the Entity object represents a parsed entity, and the parser resolved the entity, then this node will have children representing its replacement text. However all aspects of the Entity object including all its children are read-only. They may not be modified or changed in any way.

package org.w3c.dom;

public interface Entity extends Node {

  public String getPublicId();
  public String getSystemId();
  public String getNotationName();

}

The node properties of Entity are defined as follows:

node nameThe name of the entity
node valuenull
local namenull
namespace URInull
namespace prefixnull

Since Entity objects are not part of the tree, they have neither parents nor siblings. getParentNode(), getPreviousSibling(), and getNextSibling() all return null when invoked on an Entity object.

EntityReference

The EntityReference interface represents a parsed entity reference which appears in the document tree. Parsers are not required to use this class. Some parsers silently resolve all entity references to their replacement text. If a parser does not resolve external entity references, then it must include EntityReference objects instead, though the only information available from these objects will be the name. A parser that does resolve external entity references and chooses to include EntityReference objects anyway will also set the children of this node so as to the represent the entity’s replacement text. In this case, you can use the methods inherited from the Node superinterface to walk the entity’s tree. However, all these children and their descendants are completely read-only. You cannot change them in any way. If you need to modify them, you must first clone each of the EntityReference’s children, and replace the EntityReference with the cloned children.

package org.w3c.dom;

public interface EntityReference extends Node {

}

EntityReference objects are never used for the five predefined entity references (&lt;, &gt;, &amp;, &quot;, and &apos;,) or for character references such as &#xA0; or &#160;.

The node properties of EntityReference are defined as follows:

node nameThe name of the entity
node valuenull
local namenull
namespace URInull
namespace prefixnull

NamedNodeMap

DOM uses NamedNodeMap data structures to hold unordered sets of attributes, notations, and entities. You can iterate through a map using the item() and getLength(). The first item in the map is at index 0. However, the particular order the implementation chooses is not significant or even reproducible.

package org.w3c.dom;

public interface NamedNodeMap  {

  public Node getNamedItem(String name);
  public Node setNamedItem(Node node) throws DOMException;
  public Node removeNamedItem(String name) throws DOMException;
  public Node item(int index);
  public int  getLength();
  public Node getNamedItemNS(String namespaceURI, 
   String localName);
  public Node setNamedItemNS(Node node) throws DOMException;
  public Node removeNamedItemNS(String namespaceURI, 
   String localName) throws DOMException;

}

NamedNodeMaps are live. That is, adding an item to the map or removing an item from the map will add it to or remove it from whatever construct produced the map in the first place.

Node

Node is the key superinterface for almost all the other classes in the org.w3c.dom package. It is the primary means by which you navigate, search, query, and occasionally even update an XML document with DOM.

package org.w3c.dom;

public interface Node  {

  // Node type constants
  public static final short ELEMENT_NODE;
  public static final short ATTRIBUTE_NODE;
  public static final short TEXT_NODE;
  public static final short CDATA_SECTION_NODE;
  public static final short ENTITY_REFERENCE_NODE;
  public static final short ENTITY_NODE;
  public static final short PROCESSING_INSTRUCTION_NODE;
  public static final short COMMENT_NODE;
  public static final short DOCUMENT_NODE;
  public static final short DOCUMENT_TYPE_NODE;
  public static final short DOCUMENT_FRAGMENT_NODE;
  public static final short NOTATION_NODE;

  // Basic getter methods
  public String   getNodeName();
  public String   getNodeValue() throws DOMException;
  public void     setNodeValue(String value) throws DOMException;
  public short    getNodeType();
  public String   getNamespaceURI();
  public String   getPrefix();
  public void     setPrefix(String prefix) throws DOMException;
  public String   getLocalName();

  // Navigation methods
  public Node     getParentNode();
  public boolean  hasChildNodes();
  public NodeList getChildNodes();
  public Node     getFirstChild();
  public Node     getLastChild();
  public Node     getPreviousSibling();
  public Node     getNextSibling();
  public Document getOwnerDocument();
  
  // Attribute methods
  public boolean      hasAttributes();
  public NamedNodeMap getAttributes();

  // Tree modification methods
  public Node     insertBefore(Node newChild, Node refChild) 
   throws DOMException;
  public Node     replaceChild(Node newChild, Node oldChild) 
   throws DOMException;
  public Node     removeChild(Node oldChild) throws DOMException;
  public Node     appendChild(Node newChild) throws DOMException;

  // Utility methods
  public Node     cloneNode(boolean deep);
  public void     normalize();
  public boolean  isSupported(String feature, String version);

}

NodeList

NodeList is the basic DOM list type. These are most commonly used for lists of children of an Element or Document. The index of the first item in the list is 0, like Java arrays.

The actual data structure used to implement the list can vary from implementation to implementation. However, one constant is that the lists are live. In other words, if a node is deleted or moved from its parent, then it is also deleted from all lists that were built from the children of that parent. Similarly, if a new node is added to some node, then it is also added to all lists that point to the children of that node.

package org.w3c.dom;

public interface NodeList  {

  public Node item(int index);
  public int  getLength();

}

Notation

The Notation interface represents a notation declared in the document’s DTD. It does not have a position in the tree. However, the complete list of notations in the document is accessible through the getNotations() method of the DocumentType interface. Both this list and the individual Notation objects are read-only.

package org.w3c.dom;

public interface Notation extends Node {

  public String getPublicId();
  public String getSystemId();

}

The node properties of Notation are defined as follows:

node namenotation name
node valuenull
local namenull
namespace URInull
namespace prefixnull

ProcessingInstruction

The ProcessingInstruction interface represents a processing instruction node. Its node properties are defined as follows:

node namethe target
node valuethe data
local namenull
namespace URInull
namespace prefixnull
package org.w3c.dom;

public interface ProcessingInstruction extends Node {

  public String getTarget();
  public String getData();
  public void   setData(String data) throws DOMException;

}

Text

The Text interface represents a text node. It can contain any characters that are legal in XML text including characters like < and & that may need to be escaped when the document is serialized. When a parser reads an XML document and builds a DOM tree, each Text object will contain the longest possible contiguous run of text. However, DOM does not maintain this constraint as the document is manipulated in memory. Its node properties are defined as follows:

node name#text
node valuethe text of the node
local namenull
namespace URInull
namespace prefixnull

The Text interface only declares one method of its own, splitText(). Most of its functionality is inherited from the superinterfaces CharacterData and Node.

package org.w3c.dom;

public interface Text extends CharacterData {

  public Text splitText(int offset) throws DOMException;

}

Exceptions and Errors

DOM Level 2 defines only one exception class, DOMException. This is a runtime exception used for almost anything that can go wrong while constructing or manipulating a DOM Document The details are provided by a short field, code which is set to any of several named constants.

package org.w3c.dom;

public class DOMException extends RuntimeException {

  public short code;
  
  public static final short INDEX_SIZE_ERR;
  public static final short DOMSTRING_SIZE_ERR;
  public static final short HIERARCHY_REQUEST_ERR;
  public static final short WRONG_DOCUMENT_ERR;
  public static final short INVALID_CHARACTER_ERR;
  public static final short NO_DATA_ALLOWED_ERR;
  public static final short NO_MODIFICATION_ALLOWED_ERR;
  public static final short NOT_FOUND_ERR;
  public static final short NOT_SUPPORTED_ERR;
  public static final short INUSE_ATTRIBUTE_ERR;
  public static final short INVALID_STATE_ERR;
  public static final short SYNTAX_ERR;
  public static final short INVALID_MODIFICATION_ERR;
  public static final short NAMESPACE_ERR;
  public static final short INVALID_ACCESS_ERR;

  public DOMException(short code, String message);

}

org.w3c.dom.traversal

The DOM traversal API in the org.w3c.dom.traversal package provides some convenience classes for navigating and searching an XML document. The most useful aspects of this class are the ability to get lists and trees that contain the kinds of nodes that you’re interested in while ignoring everything else.

DocumentTraversal

DocumentTraversal is a factory interface for creating new NodeIterator and TreeWalker objects that present a filtered view of the content of an element or a document. (You can filter other kinds of nodes too, but there’s not a lot of point to this if they don’t have any children.)

In implementations that support the traversal API (which can be determined by invoking the hasFeature("Traversal", "2.0" ) method in the Document or DOMImplementation classes) all objects that implement Document also implement DocumentTraversal. That is, to create a DocumentTraversal object, just cast a Document to DocumentTraversal.

package org.w3c.dom.traversal;

public interface DocumentTraversal  {

  public NodeIterator createNodeIterator(Node root, 
   int whatToShow, NodeFilter filter, boolean expandEntities) 
   throws DOMException;
  public TreeWalker createTreeWalker(Node root, int whatToShow, 
   NodeFilter filter, boolean expandEntities) 
   throws DOMException;

}

NodeFilter

The NodeFilter interface is used by NodeIterators and TreeWalkers to determine which nodes are included in the view of the document they present to the client. Each node in the subtree will be passed to the filter’s acceptNode() method. This returns one of the three named constants NodeFilter.FILTER_ACCEPT (include the node), NodeFilter.FILTER_REJECT (do not include the node or any of its descendants when tree walking, do not include the node or but do include its descendants when iterating), or, NodeFilter.FILTER_SKIP (do not include the node but do include its children if they pass the filter individually).

In addition this class has thirteen named constants that can be combined with the bitwise operators and passed to createNodeIterator() and createTreeWalker() to specify which kinds of nodes should be included in their views.

package org.w3c.dom.traversal;

public interface NodeFilter  {

  public static final short FILTER_ACCEPT;
  public static final short FILTER_REJECT;
  public static final short FILTER_SKIP;
  
  public static final int SHOW_ALL;
  public static final int SHOW_ELEMENT;
  public static final int SHOW_ATTRIBUTE;
  public static final int SHOW_TEXT;
  public static final int SHOW_CDATA_SECTION;
  public static final int SHOW_ENTITY_REFERENCE;
  public static final int SHOW_ENTITY;
  public static final int SHOW_PROCESSING_INSTRUCTION;
  public static final int SHOW_COMMENT;
  public static final int SHOW_DOCUMENT;
  public static final int SHOW_DOCUMENT_TYPE;
  public static final int SHOW_DOCUMENT_FRAGMENT;
  public static final int SHOW_NOTATION;

  public short acceptNode(Node node);

}

NodeIterator

The NodeIterator interface presents a subset of nodes from the document as a list in document order. The list is live; that is, changes to the document are reflected in the list.

package org.w3c.dom.traversal;

public interface NodeIterator  {

  public Node       nextNode() throws DOMException;
  public Node       previousNode() throws DOMException;

  public Node       getRoot();
  public int        getWhatToShow();
  public NodeFilter getFilter();
  public boolean    getExpandEntityReferences();
  
  public void       detach();

}

TreeWalker

The TreeWalker interface presents a subset of nodes from the document as a tree. Walking the TreeWalker is much like walking a full Document or Element, except that many of the node’s descendants which you aren’t interested in can be filtered out so they don’t get in your way. The tree is live; that is, changes to the document are reflected in the tree.

package org.w3c.dom.traversal;

public interface TreeWalker  {

  public Node parentNode();
  public Node firstChild();
  public Node lastChild();
  public Node previousSibling();
  public Node nextSibling();
  public Node previousNode();
  public Node nextNode();

  public Node       getRoot();
  public int        getWhatToShow();
  public NodeFilter getFilter();
  public boolean    getExpandEntityReferences();
  public Node       getCurrentNode();
  public void       setCurrentNode(Node node) 
   throws DOMException;

}

Copyright 2001, 2002 Elliotte Rusty Haroldelharo@metalab.unc.eduLast Modified July 27, 2002
Up To Cafe con Leche