The Comment Interface

XML comments don’t have a lot of structure. They’re really just some undifferentiated text inside <!-- and -->. Therefore, the Comment interface, shown in Example 11.19, is a subinterface of CharacterData and shares all its method with that interface. However, your code can use the type to determine that a node is a comment, and treat it appropriately. Serializers will be smart enough to output a Comment with the right markup around it.

Example 11.19. The Comment interface

package org.w3c.dom;

public interface Comment extends CharacterData {

}

Earlier in Chapter 7, I demonstrated a SAX program that read comments. Now in Example 11.20 you can see the DOM equivalent. The approach is different— actively walking a tree instead of passively receiving events—but the effect is the same, printing the contents of comments and only comments on System.out.

Example 11.20. Printing comments

import javax.xml.parsers.*;
import org.w3c.dom.*;
import org.xml.sax.SAXException;
import java.io.IOException;


public class DOMCommentReader {

  // note use of recursion
  public static void printComments(Node node) {
    
    int type = node.getNodeType();
    if (type == Node.COMMENT_NODE) {
      Comment comment = (Comment) node;
      System.out.println(comment.getData());
      System.out.println();
    }
    else {
      if (node.hasChildNodes()) {
        NodeList children = node.getChildNodes();
        for (int i = 0; i < children.getLength(); i++) {
          printComments(children.item(i));
        } 
      }
    }
    
  }

  public static void main(String[] args) {

    if (args.length <= 0) {
      System.out.println("Usage: java DOMCommentReader URL");
      return;
    }
    
    String url = args[0];
    
    try {
      DocumentBuilderFactory factory 
       = DocumentBuilderFactory.newInstance();
      DocumentBuilder parser = factory.newDocumentBuilder();
      
      // Read the document
      Document document = parser.parse(url); 
      
      // Process the document
      DOMCommentReader.printComments(document);

    }
    catch (SAXException e) {
      System.out.println(url + " is not well-formed.");
    }
    catch (IOException e) { 
      System.out.println(
       "Due to an IOException, the parser could not check " + url
      ); 
    }
    catch (FactoryConfigurationError e) { 
      System.out.println("Could not locate a factory class"); 
    }
    catch (ParserConfigurationException e) { 
      System.out.println("Could not locate a JAXP parser"); 
    }
     
  } // end main  
  
}

Here’s the result of running this program on the XML Schema Datatypes specification:

D:\books\XMLJAVA>java DOMCommentReader 
 http://www.w3.org/TR/2001/REC-xmlschema-2-20010502/datatypes.xml
  commenting these out means only that they won't show up in the
    stylesheet generated "Revisions from previous draft" appendix


 Changes before Sept public draft commented out...
<sitem>
19990521: PVB: corrected definition of length and maxLengths 
facet for strings to be in terms of <emph>characters</emph> 
not <emph>bytes</emph>
</sitem>
<sitem>
19990521: PVB: removed issue "other-date-representations".  
We don't want other separators, left mention of aggregate reps 
for dates as an ednote.
</sitem>
<sitem>
19990521: PVB: fixed "holidays" example, "-0101" ==> "==0101"
(where == in the correction should be two hyphens, but that would
not allow us to comment out this sitem)
…

It’s not obvious from this output sample, but there is a big difference between the behavior of the SAX and DOM versions of this program. The SAX version begins producing output almost immediately because it works in streaming mode. However, the DOM version first has to read the entire document from the remote URL, parse it, and only then begin walking the tree to look for comments. The SAX and DOM versions are both limited by the speed of the network connection so they both take about the same amount of time to run on the same input data. However, the SAX version begins returning results much more quickly than the DOM version which doesn’t present any results until the entire document has been read. This may not be a big concern in a batch-mode application, but it can be very important when there is a human user. The SAX version will feel a lot more responsive.


Copyright 2001, 2002 Elliotte Rusty Haroldelharo@metalab.unc.eduLast Modified May 26, 2002
Up To Cafe con Leche