The CDATASection Interface

The CDATASection interface, shown in Example 11.11, is a subinterface of Text that specifically represents CDATA sections. It has no unique methods of its own. However, when a CDATASection is serialized into a file, the text of the node may be wrapped inside CDATA section markers so that characters like & and < do not need to be escaped as &amp; and &lt;.

Example 11.11. The CDATASection interface

package org.w3c.dom;

public interface CDATASection extends Text {

}

CDATA sections are convenient syntax sugar for documents that will sometimes be read or authored by human beings in source code form. The source code for this book uses them frequently for examples. Please don’t use CDATA sections for more than that. With the possible exception of editors, all programs that process XML documents should treat CDATA sections as identical to the same text with all the less than signs changed to &lt; and all the ampersands changed to &amp;. In particular, do not use CDATA sections as a sort of pseudo-element to hide HTML in your XML documents like this:

  <Product>
    <Name>Brass Ship's Bell</Name>
    <Quantity>1</Quantity>
    <Price currency="USD">144.95</Price >
    <Discount>.10</Discount>
    <![CDATA[<html><body>
      <b>Happy Father&rsquo;s Day to a great Dad!<P></b>
      
      <i>Love,<br>
      Sam and Beatrice<body></html>]]>
  </Product>

Instead, write well-formed HTML inside an appropriate element like this:

  <Product>
    <Name>Brass Ship's Bell</Name>
    <Quantity>1</Quantity>
    <Price currency="USD">144.95</Price >
    <Discount>.10</Discount>
    <GiftMessage><html><body>
      <p><b>Happy Father's Day to a great Dad!</b></p>
      
      <i>Love,<br />
      Sam and Beatrice</i></body></html>
    </GiftMessage>
  </Product>

This is much more flexible and much more robust. DOM parsers are not required to report CDATA sections, and other processes are even less likely to maintain them so you should not use CDATA sections as a substitute for elements.

The normalize() method in the Node interface does not combine CDATA sections with adjacent text nodes or other CDATA sections. Example 11.12 provides a static utility method that does do this. A Node is passed in as an argument. All CDATASection descendants of this node are converted to simple Text objects and then all adjacent Text objects are merged. The argument is modified in place. Thus the method returns void.

Example 11.12. Merging CDATA sections with text nodes

import org.w3c.dom.*;


public class CDATAUtility {

  // Recursively descend the tree converting all CDATA sections
  // to text nodes and merging them with adjacent text nodes.
  public static void superNormalize(Node parent) {
    
    // We'll need this to create new Text objects
    Document factory = parent.getOwnerDocument();
      
    Node current = parent.getFirstChild();
    while (current != null) {
      
      int type = current.getNodeType();
      if (type == Node.CDATA_SECTION_NODE) {
        // Convert CDATA section to a text node
        CDATASection cdata = (CDATASection) current;
        String data = cdata.getData();
        Text newNode = factory.createTextNode(data);
        parent.replaceChild(newNode, cdata);
        current = newNode;
      }
      
      // Recheck in case we changed type above
      type = current.getNodeType();
      if (type == Node.TEXT_NODE) {
        // If previous node is a text node, then append this 
        // node's data to that node, and delete this node
        Node previous = current.getPreviousSibling();
        if (previous != null) {
          int previousType = previous.getNodeType(); 
          if (previousType == Node.TEXT_NODE) {
            Text previousText = (Text) previous;
            Text currentText = (Text) current;
            String data = currentText.getData();
            previousText.appendData(data);
            parent.removeChild(current);
            current = previous;
          }
        }
      } // end if 
      else { // recurse 
        superNormalize(current);
      }
      
      // increment node
      current = current.getNextSibling();
      
    } // end while
    
  }  // end superNormalize()
  
}

More than anything else superNormalize() is an exercise in navigating the DOM tree. It uses the Node methods getFirstChild(), getNextSibling(), and getPreviousChild() in a while loop instead of iterating through a NodeList in a for loop because it’s constantly changing the contents of the node list. Node lists are live but keeping the loop counter pointed at the right node as the list changes is tricky (not impossible certainly, just not as straightforward as the approach used here).


Copyright 2001, 2002 Elliotte Rusty Haroldelharo@metalab.unc.eduLast Modified July 29, 2002
Up To Cafe con Leche