The Element Class

The structure of an XML document is based on its elements, so it should come as no surprise that the Element class is one of the larger and more important in JDOM. Since JDOM does not have any generic Node class or interface, the Element class is the primary means by which a program navigates the tree to find particular content.

Each Element object has the following seven basic properties:

Local name

A String that is initialized when the Element is constructed, and which can never be null or the empty string. It is accessible through the setName() and getName() methods:

public Element setName(String name)
    throws IllegalNameException;

public String getName();
Namespace

A Namespace object which encapsulates both the namespace URI and an optional prefix. This can be the named constant Namespace.NO_NAMESPACE if the element does not have a namespace. A namespace is always set when the Element is constructed but can be changed by setNamespace(). It can be read by the getNamespace() method.

public Element setNamespace(Namespace namespace);
public String getNamespace();
Content

A List with no duplicates containing all the element’s children in order. This is accessible through the getContent() and setContent() methods. The list is live so you can change the contents of the Element using the methods of the List class.

public Element setContent(List list)
    throws IllegalAddException;

public List getContent();

In addition, individual nodes can be added to and removed from the list via the addContent() and removeContent() methods.

Parent

The parent Element that contains this Element. It will be null if this is the root element, and may be null if this Element is not currently part of a Document. This is accessible through the getParent() method:

public Element getParent();

The parent can be changed only by adding the Element to a new parent using the parent’s addContent() method. This is only possible if the Element does not already have a parent. Before a parent can adopt a child Element, the child’s detach() method must be invoked to remove it from its current parent:

public Element detach();
Owner document

The Document that contains this Element. It will be null if this Element is not currently part of a Document. It can be read by the getDocument() method:

public Document getDocument();

It can be changed by adding the Element to a new document after first detaching it from its previous parent with the detach() method.

Attributes

A List containing Attribute objects, one for each of the element’s attributes. Although JDOM stores attributes in a list for convenience, order is not significant, and is not likely to be the same as the order in which the attributes appeared in the original document. The list is accessible through the getAttributes() and setAttributes() methods:

public Element setAttributes(List attributes)
    throws IllegalAddException;

public List getAttributes();

The items in this list can be read and modified via the getAttribute(), getAttributeValue(), and setAttribute() methods. Attributes that declare namespaces are not included in this list.

Additional namespaces

A List containing Namespace objects, one for each additional namespace prefix declared by the element (that is, other than those that declare the namespace of the element and the namespaces of its attributes). As with the list of attributes, order is not significant. The entire list is accessible through the getAdditionalNamespaces() method:

public List getAdditionalNamespaces();

Namespaces can be added to and removed from the list using the addNamespaceDeclaration(), and removeNamespaceDeclaration() methods:

public Element addNamespaceDeclaration(Namespace namespace);
public Element removeNamespaceDeclaration(Namespace namespace);

In addition, there are some other properties which are not independent of the above seven. For instance, the prefix, namespace URI, and fully qualified name are separately readable through the getNamespaceURI(), getNamespacePrefix(), and getQualifiedName() convenience methods:

public String getNamespaceURI();
public String getNamespacePrefix();
public String getQualifiedName();

These just return the relevant parts of the element’s namespace and name.

All these getter methods behave pretty much like any other getter methods. That is, they return an object of the relevant type, generally a String, and do not throw any exceptions. The setter methods are more unusual, however. This is one of the few areas where JDOM does not follow standard Java conventions. Instead of returning void, these methods all return the Element object that invoked the method. That is, a.setFoo(b) returns a. Many other methods you'd naturally expect to return void also do this. The purpose is to allow setters to be chained. For example, this code fragment can build up an entire channel element in just a couple of statements:

 Element channel = (new Element("channel"))
 .addContent((new Element("title")).setText("Cafe con Leche"))
 .addContent((new Element("link"))
  .setText("http://www.cafeconleche.org/"))
 .addContent((new Element("description"))
  .setText("XML News"));

Caution

I must say that I personally don’t find this style of code easier to write or read than the multi-statement approach. However, this is why the adder and setter methods all return the object that did the adding or setting so I felt compelled to show it to you. But I really recommend strongly that you don’t use it.

Constructors

The four public Element constructors all require you to specify a local name as a String. If the element is in a namespace, then you also need to specify the namespace URI as a String or a Namespace object. The prefix can also be specified as a String or a piece of a Namespace object.

public Element(String localName)
    throws IllegalNameException;

public Element(String localName, Namespace namespace)
    throws IllegalNameException;

public Element(String localName, String namespaceURI)
    throws IllegalNameException;

public Element(String localName, String prefix, String namespaceURI)
    throws IllegalNameException;

For example, this code fragment creates four Element objects using the various constructors:

Element xmlRPCRoot = new Element("methodCall");
Element xhtmlRoot = new Element("html", 
 "http://www.w3.org/1999/xhtml");
Element soapRoot = new Element("Envelope", "SOAP-ENV", 
 "http://schemas.xmlsoap.org/soap/envelope/");
Namespace xsd = Namespace.getNamespace("xsd", 
 "http://www.w3.org/2001/XMLSchema");
Element schemaRoot = new Element("schema", xsd);

Navigation and Search

As you learned in the last chapter, the getContent() method is the fundamental means of navigating through an XML document with JDOM. This method returns a live List which includes all the children of an element including comments, processing instructions, text nodes, and elements. To search deeper, you apply getContent() to the child elements of the current element, normally through recursion.

For example, here’s a simple program that walks the XML document tree, starting at the root element, and prints out the content of the various properties of each element. This is not the most interesting program in the book, but it does demonstrate all the major getter methods and basic navigation: Pay special attention to the process() method. You have to write a method very much like this for any JDOM program that needs to search an entire XML document. It begins with an Element (normally the root element) and recursively applies itself to all the child elements of the root element. The instanceof operator tests each object in the Element’s content list to determine its type and dispatch it to the right method. Here, TreePrinter dispatches Element objects to the process() method recursively, and ignores all other objects.

Example 15.2. Inspecting elements

import org.jdom.*;
import org.jdom.input.SAXBuilder;
import java.io.IOException;
import java.util.*;


public class TreePrinter {

  // Recursively descend the tree
  public static void process(Element element) {
    
    inspect(element);
    List content = element.getContent();
    Iterator iterator = content.iterator();
    while (iterator.hasNext()) {
      Object o = iterator.next();
      if (o instanceof Element) {
        Element child = (Element) o;
        process(child);
      }
    }
    
  }

  // Print the properties of each element
  public static void inspect(Element element) {
    
    if (!element.isRootElement()) {
      // Print a blank line to separate it from the previous
      // element.
      System.out.println(); 
    }
    
    String qualifiedName = element.getQualifiedName();
    System.out.println(qualifiedName + ":");
    
    Namespace namespace = element.getNamespace();
    if (namespace != Namespace.NO_NAMESPACE) {
      String localName = element.getName();
      String uri = element.getNamespaceURI();
      String prefix = element.getNamespacePrefix();
      System.out.println("  Local name: " + localName);
      System.out.println("  Namespace URI: " + uri);
      if (!"".equals(prefix)) {
        System.out.println("  Namespace prefix: " + prefix);
      }
    }
    List attributes = element.getAttributes();
    if (!attributes.isEmpty()) {
      Iterator iterator = attributes.iterator();
      while (iterator.hasNext()) {
        Attribute attribute = (Attribute) iterator.next();
        String name = attribute.getName();
        String value = attribute.getValue();
        Namespace attributeNamespace = attribute.getNamespace();
        if (attributeNamespace == Namespace.NO_NAMESPACE) {
          System.out.println("  " + name + "=\"" + value + "\""); 
        }
        else {
          String prefix = attributeNamespace.getPrefix();
          System.out.println(
           "  " + prefix + ":" + name + "=\"" + value + "\""); 
        }
      }
    }
    
    List namespaces = element.getAdditionalNamespaces();
    if (!namespaces.isEmpty()) {
      Iterator iterator = namespaces.iterator();
      while (iterator.hasNext()) {
        Namespace additional = (Namespace) iterator.next();
        String uri = additional.getURI();
        String prefix = additional.getPrefix();
          System.out.println(
           "  xmlns:" + prefix + "=\"" + uri + "\""); 
      }
    }
    
  }
  
  public static void main(String[] args) {

    if (args.length <= 0) {
      System.out.println("Usage: java TreePrinter URL");
      return;
    }
    
    String url = args[0];
    
    try {
      SAXBuilder parser = new SAXBuilder();
      
      // Parse the document
      Document document = parser.build(url); 
      
      // Process the root element
      process(document.getRootElement());

    }
    catch (JDOMException e) {
      System.out.println(url + " is not well-formed.");
    }
    catch (IOException e) { 
      System.out.println(
       "Due to an IOException, the parser could not encode " + url
      ); 
    }
     
  } // end main

}

Here’s the beginning of output when this chapter’s XML source code is fed into TreePrinter. DocBook doesn’t use namespaces, but the XInclude elements do. The root element has some attributes, but most of the structure is based on element name alone.

D:\books\XMLJAVA>java TreePrinter jdom_model.xml
chapter:
  revision="20020430"
  status="rough"
  id="ch_jdom_model"
  xmlns:xinclude="http://www.w3.org/2001/XInclude"

title:

para:

para:

itemizedlist:

listitem:

para:

classname:
…

While in theory you could navigate and query a document using only the List objects returned by getContent(), JDOM provides many methods to simplify the process for special cases including methods that return lists containing child elements only, methods that return particular named child elements, methods that return the complete text of an element, methods that return the text of a child element, methods to remove children identified by name and reference, methods that return the first child of an element, and more.

Child Elements

The Element class has two methods (five total when you count overloaded variants separately) that only operate on the child elements of an element, and not on other content like processing intructions and text nodes. These are getChildren() and removeChildren():

public List getChildren();
public List getChildren(String name);
public List getChildren(String name, Namespace namespace);
public List removeChildren(String name);
public List removeChildren(String name, Namespace namespace);

These methods are similar to getContent() and removeContent() except that the lists returned only contain child elements, never other kinds of children like comments and processing instructions. [1] The getChildren() methods simply ignore non-elements. For instance, the earlier TreePrinter example only considered elements. Consequently, it could use the getChildren() method instead of getContent():

  public static void process(Element element) {
    
    inspect(element);
    List content = element.getChildren();
    Iterator iterator = content.iterator();
    while (iterator.hasNext()) {
      Object o = iterator.next();  
      Element child = (Element) o;
      process(child);
    }
    
  }

This eliminates one instanceof check and one if block. This is not a huge savings, I admit; but the code is marginally more readable. However, because JDOM uses Java’s Object-based List class, you still have to cast all the items in the list getChildren() returns to Element.

The removeChildren() methods remove all the elements that match the specified name and namespace URI. If no namespace URI is given, then it removes elements with the given name in no namespace. Other content—comments, processing instructions, text, etc.—is not touched.

For example, this method recursively descends through an element, cutting out all the note elements.

  public static void cutNotes(Element element) {
    
    List notes = element.getChildren("note");
    element.removeChildren(notes);
    // The element's children have changed so we have to call
    // getChildren() again
    List children = element.getChildren();
    Iterator iterator = children.iterator();
    while (iterator.hasNext()) {
      Object o = iterator.next();  
      Element child = (Element) o;
      cutNotes(child);
    }
    
  }

It’s important to remember that when an element is removed, the entire element is removed, not just its start and end-tags. Any content inside the element is lost including, in this case, elements that aren’t named note.

Single children

Often you want to follow a very specific path through a document. For instance, consider the XML-RPC request document in Example 15.3. A program that reads this is probably primarily concerned with the content of the string element.

Example 15.3. An XML-RPC request document

<?xml version="1.0"?>
<methodCall>
  <methodName>getQuote</methodName>
  <params>
    <param>
      <value><string>RHAT</string></value>
    </param>
  </params>
</methodCall>

To get the string element, you’ll ask for the string child element of the value child element of the param child element of the params child element of the root element. Rather than iterating through a list of all the child elements when there’s only one of each of these, you can ask for the one you want directly using one of the getChild() methods:

public Element getChild(String name);
public Element getChild(String name, Namespace namespace);

For example,

Element root   = document.getRootElement();
Element params = root.getChild("params");
Element param  = params.getChild("param");
Element value  = param.getChild("value");
Element symbol = params.getChild("string");

Or, more concisely,

Element symbol = document.getRootElement()
                  .getChild("params")
                  .getChild("param");
                  .getChild("value")
                  .getChild("string");

This method has two nasty problems. The first is that it only returns the first such child. If there’s more than one child element with the specified name and namespace, you still only get the first one. The second problem is that if there's no such child, then getChild() returns null, thus leading to a NullPointerException. Since both of these are very real possibilities in many applications, including XML-RPC, you should normally prefer getChildren() unless you’ve used some form of schema or DTD to verify that there’s exactly one of each child you address with these methods. getChildren() always returns a non-null list you can safely iterate through to process anywhere from zero to thousands of child elements.

Similarly, you can remove a single named child element with one of the two removeChild() methods, each of which returns the removed Element in case you want to save it for later use:

public Element removeChild(String name);
public Element removeChild(String name, Namespace namespace);

The removeChild() method shares with getChild() the problem of operating only on the first such element. However, after you’ve removed the first child, the second child is now the first. After you’ve removed that one, the original third child is now the first, and so on. Thus, there is one option that doesn’t work with getChild(). You can simply call removeChild() repeatedly until it returns null, indicating that there was no further such child. For example, this code fragment removes all the immediate note children of the Element named element:

while (element.removeChild("note") != null) ;

However, unlike the earlier example with removeChildren(), this is not recursive and will not find note elements deeper in the tree.

Getting and setting the text of an element

Sometimes what you want is the text of an element. For this purpose, JDOM provides these four methods:

public String getText();
public String getTextTrim();
public String getTextNormalize();
public Element setText(String text);

The getText() method returns the PCDATA content of the element. The getTextTrim() method returns pretty much the same content except that all leading and trailing whitespace has been removed. The getTextNormalize() method not only strips all leading and trailing whitespace; it also converts all runs of spaces to a single space. For example, consider this street element:

<street> 135  Airline  Highway </street>

For this element, getText() returns “ 135  Airline  Highway ” with the white space unchanged. However, getTextTrim() returns “135  Airline  Highway”, and getTextNormalize() returns “135 Airline Highway”. It’s an application level decision which one you want.

This is trickier than you might think at first glance. For instance, consider this street element:

<street>135<!-- The building doesn't actually have a number.
                It's next door to 133 -->Airline Highway</street>

getText() returns “135Airline Highway”. It ignores comments and processing instructions as if they weren’t there. For the most part that seems reasonable.

Now consider this street element:

<street>135 Airline Highway <apartment>2B</apartment></street>

getText() returns “135 Airline Highway ”. The content in the child apartment element is completely lost. This is not really a good thing. (I argued about this in the JDOM group, but I lost.) Before you can reliably use any of the getText()/getTextTrim()/ getTextNormalize() methods you need to be very sure that the element does not have any child elements. One way to do this is to test if the number of child elements is zero before invoking the text getter. For example,

if (element.getChildren().size() == 0) {
   String result = element.getText();
   // work with result …
}
else {
  // do something more complex …
}

An alternative is to write your own method that recursively descends through the element, accumulating all its text. I’ll demonstrate this in the section on the Text class shortly.

Do not use any of these getter methods unless you have first validated the document against a DTD or schema that explicitly requires the element only to contain #PCDATA. Do not assume that you “know” that this is true in your domain without individually testing each document. Invariably, sooner or later, you will encounter a document that purports to adhere to the implicit schema, and indeed is very close to it, but does not quite match what you were assuming. Explicit validation is necessary.

The setText() method is a little less fraught. You can set the text content of any element to whatever text you desire. For example, this code fragment sets the text of the street element to the string “3520 Airline Drive”:

street.setText("3520 Airline Drive");

This completely wipes out any existing content the element has: child elements, descendants, comments, processing instructions, other text, etc. If you just want to append the string to the existing text, use the addContent() method instead.

Getting child text

One common pattern in XML documents is an element that contains only other elements, all of which contain only PCDATA such as this channel element from Slashdot’s RSS file:

<channel>
  <title>Slashdot: News for nerds, stuff that matters</title>
  <link>http://slashdot.org/</link>
  <description>News for nerds, stuff that matters</description>
</channel>

Given such an element, JDOM provides six convenience methods for extracting the text, the trimmed text, and the normalized text from these child elements:

public String getChildText(String name);
public String getChildText(String name, Namespace namespace);
public String getChildTextTrim(String name);
public String getChildTextTrim(String name, Namespace namespace);
public String getChildTextNormalize(String name);
public String getChildTextNormalize(String name, Namespace namespace);

For example, assuming the Element object channel represents the above channel element, this code fragment retrieves the content of the title, link, and description elements:

String title = channel.getChildText("title");
String description = channel.getChildText("description");
String link = channel.getChildText("link");

There are two things I really don’t like about these methods. First, like the getText/getTextTrim/getTextNormalize() methods, they all fail unexpectedly and silently if any of the child elements unexpectedly contain child elements. For example, the above code fragment fails massively if Slashdot changes its format and begins distributing content like this instead:

<channel>
  <title>
    <trademark>Slashdot</trademark> 
    <trademark>News for nerds, stuff that matters</trademark>
  </title>
  <link>http://slashdot.org/</link>
  <description>
    <trademark>News for nerds, stuff that matters</trademark>
  </description>
</channel>

Secondly, these methods fail unexpectedly and silently if the any of the child elements are repeated. For example, suppose instead the channel element has three link children like this:

<channel>
  <title>Slashdot: News for nerds, stuff that matters</title>
  <link>http://slashdot.org/</link>
  <link>http://www.slashdot.org/</link>
  <link>http://slashdot.com/</link>
  <description>News for nerds, stuff that matters</description>
</channel>

All three methods return the text from the first link element, and do not bother to inform the client program that there are more it may be interested in.

As with getText/getTextTrim/getTextNormalize(), do not use any of these methods without first validating the document against a DTD or schema that explicitly requires the child elements only to contain #PCDATA and to occur exactly once each in each parent element.

Filters

You can pass an org.jdom.filter.Filter object to the getContent() method to limit the content returned by the method. This interface, shown in Example 15.4, determines whether an object can be added to, removed from, or included in a particular list. For the purposes of navigation and search, only the matches() method really matters. It determines whether or not any particular object is included in the List returned by getContent(). The canAdd() and canRemove() methods test whether a particular object can be added to or removed from the list respectively. However, in the two default implementations of this class in ElementFilter and ContentFilter, both of these methods just call matches().

Example 15.4. The JDOM Filter interface

package org.jdom.filter;

public interface Filter {

  public boolean canAdd(Object o);
  public boolean canRemove(Object o);
  public boolean matches(Object o);
    
}

The org.jdom.filter package includes two implementations of this interface, ContentFilter (Example 15.5) and ElementFilter (Example 15.6). The ContentFilter class allows you to specify the visibility of different JDOM node types like ProcessingInstruction and Text. ElementFilter allows you to select elements with certain names or namespaces. Finally, you can write your own custom implementations that filter according to application-specific criteria.

Example 15.5. The ContentFilter class

package org.jdom.filter;

public class ContentFilter implements Filter {

  public static final int ELEMENT   = 1;
  public static final int CDATA     = 2;
  public static final int TEXT      = 4;
  public static final int COMMENT   = 8;
  public static final int PI        = 16;
  public static final int ENTITYREF = 32;
  public static final int DOCUMENT  = 64;

  protected int filterMask;

  public ContentFilter();
  public ContentFilter(boolean allVisible);
  public ContentFilter(int mask);
  
  public int  getFilterMask();
  public void setFilterMask(int mask);
  public void setDefaultMask();
  
  public void setDocumentContent();
  public void setElementContent();
  
  public void setElementVisible(boolean visible);
  public void setCDATAVisible(boolean visible)
  public void setTextVisible(boolean visible);
  public void setCommentVisible(boolean visible);
  public void setPIVisible(boolean visible);
  public void setEntityRefVisible(boolean visible);
  
  public boolean canAdd(Object o);
  public boolean canRemove(Object o);
  public boolean matches(Object o);

  
  public boolean equals(Object o);
  
}

For example, suppose your application only needs to concern itself with elements and text, but can completely skip all comments and processing instructions. You can simplify the code by using an appropriately configured ContentFilter. The most convenient approach is to construct a filter that filters out all nodes by passing false to the constructor, and then turn on only the types you want to let through like this:

// Filter out everything by default
Filter filter = new ContentFilter(false);
// Allow elements through the filter
filter.setElementVisible(true);
// Allow text nodes through the filter
filter.setTextVisible(true);

You’ll need to pass filter to getContent() every time you call it, like so:

  Filter filter; // set up in constructor
   
  public static void process(Element element) {
   
    List children = element.getContent(filter);
    Iterator iterator = children.iterator();
    while (iterator.hasNext()) {
      Object o = iterator.next();
      if (o instanceof Element) {
        Element child = (Element) o;
        process(element);
      }
      else { // Due to filter, the only other possibility is Text
        Text text = (Text) o;
        handleText(text);
      }
    }
    
  }

You normally want to allow elements to pass the filter, even if you’re only looking at other things like Text. In JDOM recursing through the Element objects is the only way to search a complete tree. If you filter out the Elements, you won’t be able to go more than one level deep from where you start.

If you only want to select elements, you can use an ElementFilter instead. This can be set up to select all elements, elements with a certain name, elements in a certain namespace, or elements with a certain name in a certain namespace.

Example 15.6. The ElementFilter class

package org.jdom.filter;

public class ElementFilter implements Filter {

  protected String    name;
  protected Namespace namespace;
  
  public ElementFilter();
  public ElementFilter(String name);
  public ElementFilter(Namespace namespace);
  public ElementFilter(String name, Namespace namespace);

  public boolean canAdd(Object o);
  public boolean canRemove(Object o);
  public boolean matches(Object o);

  public boolean equals(Object o);
  
}

For example, this code fragment uses an ElementFilter to create a List named content that only contains XSLT elements:

Namespace xslt = Namespace.getNamespace(
                   "http://www.w3.org/1999/XSL/Transform");   
Filter filter = new ElementFilter(xslt);
List content = element.getContent(filter);

Once again, however, this method proves to be less generally useful than the DOM equivalents because the getContent() method only returns children, not all descendants. For example, you couldn’t really use this to select the XSLT elements or the non-XSLT elements in a stylesheet because each type can appear as children of the other type.

Filters also work in the Document class, pretty much the same way they work in the Element class. For example, suppose you want to find all the processing instructions in the Document object doc outside the root element. This code fragment creates a List containing those:

// Filter out everything by default
Filter pisOnly = new ContentFilter(false);
// Allow processing instructions through the filter
pisOnly.setPIVisible(true);
// Get the content
List pis = doc.getContent(pisOnly);

If you want something a little more useful, like a filter that selects all xml-stylesheet processing instructions in the prolog only, then you need to write a custom implementation of Filter. Example 15.7 demonstrates.

Example 15.7. A filter for xml-stylesheet processing instructions in the prolog

import org.jdom.filter.Filter;
import org.jdom.*;
import java.util.List;


public class StylesheetFilter implements Filter {

  // This filter is read-only. Nothing can be added or removed.
  public boolean canAdd(Object o) {
    return false; 
  }
  
  public boolean canRemove(Object o) {
    return false;  
  }
  
  public boolean matches(Object o) {
   
    if (o instanceof ProcessingInstruction) {
      ProcessingInstruction pi = (ProcessingInstruction) o; 
      if (pi.getTarget().equals("xml-stylesheet")) {
        // Test to see if we're outside the root element
        if (pi.getParent() == null) {
          Document doc = pi.getDocument();
          Element root = doc.getRootElement();
          List content = doc.getContent();
          if (content.indexOf(pi) < content.indexOf(root)) {
            // In prolog
            return true;
          }
        }
      }
    }
    return false;
   
  }
    
}

Adding and removing children

You can append any legal node to an Element using the 6-way overloaded addContent() methods:

public Element addContent(String s);
public Element addContent(Text text)
    throws IllegalAddException;

public Element addContent(Element element)
    throws IllegalAddException;

public Element addContent(ProcessingInstruction instruction)
    throws IllegalAddException;

public Element addContent(EntityRef ref)
    throws IllegalAddException;

public Element addContent(Comment comment)
    throws IllegalAddException;

These methods append their argument to Element’s child list. Except for addContent(String), they all throw an IllegalAddException if the argument already has a parent element. (The addContent(String) method is just a convenience that creates a new Text node behind the scenes. It does not actually add a String object to the content list.) All return the same Element object that invoked them which allows for convenient chaining.

These methods all add the new node to the end of the Element’s list. If you want to insert a node in a different position, you’ll have to retrieve the List object itself. For example, this code fragment creates the same channel element by inserting all the child nodes in reverse order at the beginning of the list using the add(int index, Object o) method:

Element channel     = new Element("channel");
Element link        = new Element("link");
Element description = new Element("description");
Element title       = new Element("title");
title.setText("Slashdot");
link.setText("http://slashdot.org/");
description.setText("News for nerds");

List content = channel.getContent();
content.add(0, description);
content.add(0, link);
content.add(0, title);

There are six removeContent() methods that remove a node from the list, wherever it resides:

public Element removeContent(Text text);
public Element removeContent(CDATA cdata);
public Element removeContent(Element element);
public Element removeContent(ProcessingInstruction instruction);
public Element removeContent(EntityRef ref);
public Element removeContent(Comment comment);

Of course, you can also retrieve the List from the Element with getContent() and remove elements by position using the list’s remove() and removeAll() methods. However, doing so is relatively rare. Normally you have or can easily get a reference to the specific node you want to remove. For example, this deletes the first link child element of the channel element:

channel.removeContent(channel.getChild("link"));

There is currently no method to remove all the content from an Element. Instead, just pass null to setContent(). That is,

element.setContent(null);

Parents and ancestors

So far we’ve mostly focused on moving down the tree using methods that return children and recursion. However, JDOM can also move up the tree as well. [2] As with the child-returning methods, you can only jump one level at a time. That is, you can only get the parent directly. To get other ancestor elements, you need to ask for the parent’s parent, the parent of the parent’s parent, and so forth, until eventually you find an element whose parent is null, which is of course the root of the tree.

Each Element object has zero or one parents. If the Element is the root element of the document (or at least the root of the tree in the event that the Element is not currently part of a Document), then this parent is null. Otherwise it is another Element object. JDOM does not consider the owner document to be the parent of the root element. These three methods enable you to determine whether or not an Element object represents a root element, and what its parent and owner document are:

public Document getDocument();
public boolean isRootElement();
public Element getParent();

Unlike DOM Elements, JDOM Elements are not irrevocably tied to their owner document. An Element may be in no document at all (in which case getDocument() returns null); and it may be moved from one document to another. However, JDOM Elements cannot have more than one parent at a time. Before you can move an element to a different Document or a different position in the same Document, you first have to detach it from its current parent by invoking the detach() method:

public Element detach();

After you’ve called detach(), you are free to add the Element to any other Element or Document. For example, Example 15.8 loads the XML document at http://www.slashdot.org/slashdot.rdf, detaches all the link elements from that document, and inserts them in a new linkset element, which it then outputs. Without the call to detach(), this would fail with an IllegalAddException.

Example 15.8. Moving elements between documents

import org.jdom.*;
import org.jdom.input.SAXBuilder;
import org.jdom.output.XMLOutputter;
import java.io.IOException;
import java.util.*;


public class Linkset {
  
  public static void main(String[] args) {
    
    String url = "http://www.slashdot.org/slashdot.rdf";
    
    try {
      SAXBuilder parser = new SAXBuilder();
      
      // Parse the document
      Document document = parser.build(url); 
      Element oldRoot = document.getRootElement();
      Element newRoot = new Element("linkset");
      List content = oldRoot.getChildren();
      Iterator iterator = content.iterator();
      while (iterator.hasNext()) {
        Object next = iterator.next();
        Element element = (Element) next; 
        Element link = element.getChild("link", 
         Namespace.getNamespace(
         "http://my.netscape.com/rdf/simple/0.9/"));
        link.detach();
        newRoot.addContent(link);
      }

      XMLOutputter outputter = new XMLOutputter("  ", true);
      outputter.output(newRoot, System.out);
    }
    catch (JDOMException e) {
      System.out.println(url + " is not well-formed.");
    }
    catch (IOException e) { 
      System.out.println(
       "Due to an IOException, the parser could not read " + url
      ); 
    }
     
  } // end main

}

As usual, this only affects the JDOM Document object in memory. It has no effect on the original document read from the remote URL.

Another natural limitation is that an element cannot be its own parent or ancestor, directly or indirectly. Trying to add an element where it would violate this restriction throws an IllegalAddException. You can test whether one element is an ancestor of another using the isAncestor() method:

public boolean isAncestor(Element element);

Attributes

The Element class has thirteen methods that read and write the values of the various attributes of the element. Except for certain unusual cases (mostly involving attribute types) these thirteen methods are all that’s needed to handle attributes. You rarely need to concern yourself with the Attribute class directly.

public Attribute getAttribute(String name);
public Attribute getAttribute(String name, Namespace namespace);
public String getAttributeValue(String name);
public String getAttributeValue(String name, Namespace namespace);
public String getAttributeValue(String name, String default);
public String getAttributeValue(String name, Namespace namespace, String default);
public Element setAttributes(List attributes)
    throws IllegalAddException;

public Element setAttribute(String name, String value)
    throws IllegalNameException, IllegalDataException;

public Element setAttribute(String name, String value, Namespace namespace)
    throws IllegalNameException, IllegalDataException;

public Element setAttribute(Attribute attribute)
    throws IllegalAddException;

public boolean removeAttribute(String name, String value);
public boolean removeAttribute(String name, Namespace namespace);
public boolean removeAttribute(Attribute attribute);

These methods all follow the same basic rules. If an attribute is in a namespace, specify the local name and namespace to access it. If the attribute is not in a namespace, then only use the name. The setters must also specify the value to set the attribute to. The getters may optionally specify a default value used if the attribute is not found. Alternately, you can use an Attribute object to replace all of these. Most of the time, however, strings are more convenient.

The getAttributeValue() methods all return the String value of the attribute. If the attribute was read by a parser, the value will be normalized according to its type. However, attributes added in-memory with setAttribute() and its ilk will not be normalized. The setter methods all return the Element object itself so they can be used in a chain. The remove methods all return a boolean, true if the attribute was removed, false if it wasn’t.

As with most other constructs, JDOM checks all the attributes you set for well-formedness and throws an exception if anything looks amiss. In particular,

  • The local name must be a non-colonized name.

  • The value can’t contain any illegal characters like null or the byte order mark.

  • The attribute cannot be a namespace declaration such as xmlns or xmlns:prefix. (JDOM stores these separately.)

For example, let’s suppose you want to process a RDDL document to find resources related to a particular namespace URI. Each of these is enclosed in a rddl:resource element like this one from the RDDL specification itself:

<rddl:resource xlink:type="simple"
        xlink:title="RDDL Natures"
        xlink:role="http://www.rddl.org/"
        xlink:arcrole="http://www.rddl.org/purposes#directory"
        xlink:href="http://www.rddl.org/natures"
>
<div class="resource">
<p>It is anticipated that many related-resource natures will be 
   well known. A list of well-known natures may be found in the 
   RDDL directory <a href=
   "http://www.rddl.org/natures">http://www.rddl.org/natures</a>.
</p>
</div>
</rddl:resource>

All the information required to locate the resources is included in the attributes of the rddl:resource elements. The rest of the content in the document is relevant only to a browser showing the document to a human reader. Most software will want to read these rddl:resource elements and ignore the rest of the document. Example 15.9 is such a program. It searches a document for related resources and outputs an HTML table containing their information. The xlink:href attribute becomes an HTML hyperlink. The other URLs in the xlink:role and xlink:arcrole attributes are purely descriptive (like namespace URLs) and not intended to be resolved, so they’re merely output as plain text.

Example 15.9. Searching for RDDL resources

import org.jdom.*;
import org.jdom.input.SAXBuilder;
import org.jdom.output.XMLOutputter;
import java.util.*;
import java.io.IOException;


public class RDDLLister {
  
  public final static Namespace XLINK_NAMESPACE = 
   Namespace.getNamespace("xl", "http://www.w3.org/1999/xlink");
  public final static String RDDL_NAMESPACE 
   = "http://www.rddl.org/";

  public static void main(String[] args) {
    
    if (args.length <= 0) {
      System.out.println("Usage: java RDDLLister url");
      return; 
    }
    
    SAXBuilder builder = new SAXBuilder();
    
    try {
      // Prepare the output document
      Element html = new Element("html");
      Element body = new Element("body");
      Element table = new Element("table");
      html.addContent(body);
      body.addContent(table);
      Document output = new Document(html);
      
      // Read the entire document into memory
      Document doc = builder.build(args[0]);
      Element root = doc.getRootElement();
      processElement(root, table); 
      
      // Serialize the output document
      XMLOutputter outputter = new XMLOutputter("  ", true);
      outputter.output(output, System.out);
      
    }
    catch (JDOMException e) {
      System.err.println(e); 
    }
    catch (IOException e) {
      System.err.println(e); 
    }
        
  } // end main

  public static void processElement(Element input, Element output) {
    
    if (input.getName().equals("resource") 
     && input.getNamespaceURI().equals(RDDL_NAMESPACE)) {
     
       String href    = input.getAttributeValue("href", XLINK_NAMESPACE);
       String title   = input.getAttributeValue("title", XLINK_NAMESPACE);
       String role    = input.getAttributeValue("role", XLINK_NAMESPACE);
       String arcrole = input.getAttributeValue("arcrole", XLINK_NAMESPACE);
     
       // Wrap this up in a table row
       Element tr = new Element("tr");
       
       Element titleCell = new Element("td");
       titleCell.setText(title);
       tr.addContent(titleCell);
       
       Element hrefCell = new Element("td");
       Element a = new Element("a");
       a.setAttribute("href", href);
       a.setText(href);
       hrefCell.addContent(a);
       tr.addContent(hrefCell);
       
       Element roleCell = new Element("td");
       roleCell.setText(role);
       tr.addContent(roleCell);
       
       Element arcroleCell = new Element("td");
       arcroleCell.setText(arcrole);
       tr.addContent(arcroleCell);
  
       output.addContent(tr);       
     
    }
    
    // Recurse
    List content = input.getContent();
    Iterator iterator = content.iterator();
    while (iterator.hasNext()) {
      Object o = iterator.next();
      if (o instanceof Element) {
        processElement((Element) o, output);   
      }
    } // end while
    
  }

}

The main() method builds the general outline of a well-formed HTML document, and then parses the input RDDL document in the usual fashion. It retrieves the root element with getRootElement() and then passes this root element and the table element to the processElement() method.

First processElement() checks to see if the element is a rddl:resource element. If it is, then processElement() extracts the four XLink attributes using getAttributeValue(). Each of these is then inserted in a td element which is appended to a tr element which is added to the table element. The setAttribute() method attaches an href attribute to the a element that defines the HTML link. Finally, the processElement() method is invoked on all child elements of the current elements to find any rddl:resource elements that are deeper down the tree.

Here’s the beginning of output from when I ran this program against the RDDL specification itself:

D:\books\XMLJAVA>java RDDLLister http://www.rddl.org
<?xml version="1.0" encoding="UTF-8"?>
<html>
  <body>
    <table>
      <tr>
        <td>RDDL Natures</td>
        <td>
          <a href="http://www.rddl.org/natures">
           http://www.rddl.org/natures</a>
        </td>
        <td>http://www.rddl.org/</td>
        <td>http://www.rddl.org/purposes#directory</td>
      </tr>
      <tr>
        <td>RDDL Purposes</td>
        <td>
          <a href="http://www.rddl.org/purposes">
           http://www.rddl.org/purposes</a>
        </td>
        <td>http://www.rddl.org/</td>
        <td>http://www.rddl.org/purposes#directory</td>
      </tr>
…


[1] The name is a little misleading. An earlier beta called these methods getChildElements() and removeChildElements(), much better names in my opinion.

[2] Sideways movement, e.g. getting the previous or next sibling, is noticeably lacking. For this, you normally use List and Iterator.


Copyright 2001, 2002 Elliotte Rusty Haroldelharo@metalab.unc.eduLast Modified December 10, 2002
Up To Cafe con Leche