The Text Class

JDOM uses the Text class internally to represent text nodes. In normal usage you don’t deal with this class directly. You just use strings. The one time you may encounter it is when you use getContent() to retrieve all the children of the node, and you’re iterating through the list returned. In this case, you will see Text objects.

Each Text object has a parent Element (which may be null) and a String value that holds the content of the node. This value may contain characters like < and &. If so, they will be escaped when the node is serialized. However they do not need to be escaped before inserting them into a Text object.

The Text class, summarized in Example 15.11, has methods to get, set, and detach the parent Element, to get and set the text content as a String, to append more text to the node, and to get the content of the node after trimming or normalizing white space. And of course, it has the other usual Java methods such as equals(), hashCode(), and clone() that all JDOM objects possess.

Example 15.11. The JDOM Text class

package org.jdom;

public class Text implements Serializable, Cloneable {

  protected String value;
  protected Object parent;

  protected Text();
  public    Text(String s);
  
  public String getText();
  public String getTextTrim();
  public String getTextNormalize();

  public static String normalizeString(String s);

  public Text     setText(String s);
  public void     append(String s);
  public void     append(Text text);
  public Element  getParent();
  public Document getDocument();
  protected Text  setParent(Element parent);
  public Text     detach();

  public       String  toString();
  public final int     hashCode();
  public final boolean equals(Object ob);
  public       Object  clone();
  
}

JDOM does not guarantee that each run of text is represented by a single text node. Text objects can be adjacent to each other. This can make it a little tricky to retrieve the complete content of an element. For example, consider this element:

  <vendor>
    Gus's  Crawfish
  </vendor>

Just from looking at the XML, there’s no way to say whether the Element object representing the vendor element contains one Text object or two. Indeed, in extreme cases, it may contain three, four, or even more. If this element was read by SAXBuilder, then JDOM does use a single Text object. However, if it was created or modified in memory by a program, then all bets are off.

In fact, you’d need to concern yourself with this even if JDOM did not allow adjacent text nodes. For example, consider this element:

  <vendor>
    Gus's <!-- This is my brother-in-law. My wife asked me to
         throw him some business. --> Crawfish
  </vendor>

The text content of the vendor element is the same as before. However, now there’s no way for JDOM to represent it as a single Text object.

You must also consider the case where an element contains child elements such as this one:

  <vendor>
    Gus's <seafood>Crawfish</seafood>
  </vendor>

To accumulate the complete text of an element you need to iterate through its children, while recursively processing any element children. This getFullText() method demonstrates:

  public static String getFullText(Element element) {
  
    StringBuffer result = new StringBuffer();
    List content = element.getContent();
    Iterator iterator = content.iterator();
    while (iterator.hasNext()) {
      Object o = iterator.next();
      if (o instanceof Text) {
        Text t = (Text) o;
        result.append(t.getText());
      }
      else if (o instanceof Element) {
        Element child = (Element) o;
        result.append(getFullValue(child));
      }
    }
  
    return result.toString();
  
  }

Chapter 11 demonstrated a program that encoded all the text of a document, but not its markup, in ROT-13 using DOM. Let’s repeat that example here, but now with JDOM instead. You can compare it to Example 11.8 to get a good feeling for the differences between DOM and JDOM. The DOM version is significantly more complex, especially when it comes to building the document and then serializing it.

Example 15.12. JDOM based ROT13 encoder for XML documents

import org.jdom.*;
import org.jdom.output.XMLOutputter;
import org.jdom.input.SAXBuilder;
import java.io.IOException;
import java.util.*;


public class ROT13XML {

  // note use of recursion
  public static void encode(Element element) {
    
    List content = element.getContent();
    Iterator iterator = content.iterator();
    while (iterator.hasNext()) {
      Object o = iterator.next();
      if (o instanceof Text) {
        Text t = (Text) o;
        String cipherText = rot13(t.getText());
        t.setText(cipherText);
      }
      else if (o instanceof Element) {
        Element child = (Element) o;
        encode(child);
      }
    }
    
  }
  
  public static String rot13(String s) {
    
    StringBuffer out = new StringBuffer(s.length());
    for (int i = 0; i < s.length(); i++) {
      int c = s.charAt(i);
      if (c >= 'A' && c <= 'M') out.append((char) (c+13));
      else if (c >= 'N' && c <= 'Z') out.append((char) (c-13));
      else if (c >= 'a' && c <= 'm') out.append((char) (c+13));
      else if (c >= 'n' && c <= 'z') out.append((char) (c-13));
      else out.append((char) c);
    } 
    return out.toString();
    
  }

  public static void main(String[] args) {

    if (args.length <= 0) {
      System.out.println("Usage: java ROT13XML URL");
      return;
    }
    
    String url = args[0];
    
    try {
      SAXBuilder parser = new SAXBuilder();
      
      // Read the document
      Document document = parser.build(url); 
      
      // Modify the document
      ROT13XML.encode(document.getRootElement());

      // Write it out again
      XMLOutputter outputter = new XMLOutputter();
      outputter.output(document, System.out);
    }
    catch (JDOMException e) {
      System.out.println(url + " is not well-formed.");
    }
    catch (IOException e) { 
      System.out.println(
       "Due to an IOException, the parser could not encode " + url
      ); 
    }
     
  } // end main

}

Here’s a joke encoded by this program. You’ll have to run the program if you want to find out what it says. :-)

D:\books\XMLJAVA>java ROT13XML joke.xml
<?xml version="1.0" encoding="UTF-8"?>
<joke>
  Gur qrsvavgvba bs n yvoregnevna vf n pbafreingvir
  haqre vaqvpgzrag.
</joke>

Copyright 2001, 2002 Elliotte Rusty Haroldelharo@metalab.unc.eduLast Modified July 18, 2002
Up To Cafe con Leche