The DocType class

The DocType class
Prev	Chapter 15. The JDOM Model	Next

The org.jdom.DocType class summarized in Example 15.19 represents a document type declaration. Note that this points to and/or contains the document type definition (DTD). However, it is not the same thing. JDOM does not have any representation of the DTD.

Example 15.19. The JDOM DocType class

package org.jdom;

public class DocType implements Serializable, Cloneable {

  protected String   elementName;
  protected String   publicID;
  protected String   systemID;
  protected Document document;
  protected String   internalSubset;

  protected DocType();
  public DocType(String elementName, String publicID, 
   String systemID);
  public DocType(String elementName, String systemID);
  public DocType(String elementName);

  public String   getElementName();
  public DocType  setElementName(String elementName);
  public String   getPublicID();
  public DocType  setPublicID(String publicID);
  public String   getSystemID();
  public DocType  setSystemID(String systemID);
  public Document getDocument();
  public void     setInternalSubset(String newData);

  public String getInternalSubset();

  public       String  toString();
  public final boolean equals(Object o);
  public final int     hashCode();
  public       Object  clone();

}

Each DocType object has four String properties, of which the last three may be null:

root element name
internal DTD subset
system ID
public ID

For example, consider this document type declaration:

<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
                         "docbook/docbookx.dtd">

It has the root element name chapter, the public ID -//OASIS//DTD DocBook XML V4.1.2//EN, and the system ID docbook/docbookx.dtd. However, its internal DTD subset is null. This code fragment constructs a DocType object representing this document type declaration and uses it to construct a new Document object:

DocType doctype = new DocType("chapter", 
 "-//OASIS//DTD DocBook XML V4.1.2//EN", "docbook/docbookx.dtd");
Element chapter = new Element("chapter");
Document doc = new Document(chapter, doctype);

However, JDOM does not require validity, only well-formedness. This means that the root element may in fact be different than what the document type declaration specifies. For example, this is perfectly legal:

DocType doctype = new DocType("chapter", 
 "-//OASIS//DTD DocBook XML V4.1.2//EN", "docbook/docbookx.dtd");
Element book = new Element("book");
Document doc = new Document(book, doctype);

This document type declaration has a root element name and an internal DTD subset, but no public ID or system ID:

<!DOCTYPE Fibonacci_Numbers [
  <!ELEMENT Fibonacci_Numbers (fibonacci*)>
  <!ELEMENT fibonacci (#PCDATA)>
  <!ATTLIST fibonacci index CDATA #IMPLIED>
]>

To set this up, you need to store the internal subset in a String and pass that to the setInternalSubset() method after the DocType object has been constructed like so:

DocType doctype = new DocType("Fibonacci_Numbers");
String dtd = "<!ELEMENT Fibonacci_Numbers (fibonacci*)>\n";
dtd += "<!ELEMENT fibonacci (#PCDATA)>\n";
dtd += "<!ATTLIST fibonacci index CDATA #IMPLIED>\n";
doctype.setInternalSubset(dtd);
Element root = new Element("Fibonacci_Numbers");
Document doc = new Document(root, doctype);

Unlike most node classes, JDOM doesn’t fully check the data used in a DocType object for well-formedness. It does test that the root element name is a legal XML name, and it checks that the public and system IDs adhere to the minimum constraints for these items. However, it does not check that the public ID follows the standard conventions for public identifiers; it does not check that the system ID is a legal URL; and it does not even check the characters in the internal DTD subset, much less the syntax.

As an example of this class, let’s look at a program that validates XHTML 1.0 documents. XHTML validity is a little stricter than HTML validity. In particular, according to the XHTML 1.0 specification, a valid XHTML document must satisfy these four conditions:

It must be valid according to one of the three XHTML DTDs: strict, transitional, or frameset.
The root element of the document must be html.
This root html element of the document must specify the default namespace as http://www.w3.org/1999/xhtml using an xmlns attribute.
The document must contain a DOCTYPE declaration. The public identifier for the external DTD subset must reference one of the three XHTML DTDs using one of these three public identifiers:
- -//W3C//DTD XHTML 1.0 Strict//EN
- -//W3C//DTD XHTML 1.0 Transitional//EN
- -//W3C//DTD XHTML 1.0 Frameset//EN

There are a few other flaky rules scattered throughout the XHTML specification, mostly involving constraints that can’t be reasonably specified in a DTD such as that an a element cannot contain another a element, but these are the major ones that define strict XHTML conformance.

Example 15.20 is similar to the earlier JDOMValidator. That is, it reads a URL from the command line and validates the document found at that URL against its DTD. However, it also checks the above constraints. Of particular interest for this section is that it checks that the document type declaration is pointing to one of the three legal DTDs. This is something pure XML validation normally doesn’t tell you.

Example 15.20. Validating XHTML with the DocType class

import java.io.IOException;
import org.jdom.*;
import org.jdom.input.SAXBuilder;


public class XHTMLValidator {

  public static void main(String[] args) {
    
    for (int i = 0; i < args.length; i++) {
      validate(args[i]);
    }   
    
  }

  private static SAXBuilder builder = new SAXBuilder(true);
                               /* turn on validation ^^^^ */
  
  // not thread safe
  public static void validate(String source) {
        
      Document document;
      try {
        document = builder.build(source); 
      }
      catch (JDOMException e) {  
        System.out.println(source 
         + " is invalid XML, and thus not XHTML."); 
        return; 
      }
      catch (IOException e) {  
        System.out.println("Could not read: " + source); 
        return; 
      }
      
      // If we get this far, then the document is valid XML.
      // Check to see whether the document is actually XHTML 
      boolean valid = true;       
      DocType doctype = document.getDocType();
    
      if (doctype == null) {
        System.out.println("No DOCTYPE");
        valid = false;
      }
      else {
        // verify the DOCTYPE
        String name     = doctype.getElementName();
        String systemID = doctype.getSystemID();
        String publicID = doctype.getPublicID();
      
        if (!name.equals("html")) {
          System.out.println(
           "Incorrect root element name " + name);
          valid = false;
        }
    
        if (publicID == null
         || (!publicID.equals("-//W3C//DTD XHTML 1.0 Strict//EN")
           && !publicID.equals(
            "-//W3C//DTD XHTML 1.0 Transitional//EN")
           && !publicID.equals(
            "-//W3C//DTD XHTML 1.0 Frameset//EN"))) {
          valid = false;
          System.out.println(source 
           + " does not seem to use an XHTML 1.0 DTD");
        }
      }
    
    
      // Check the namespace on the root element
      Element root = document.getRootElement();
      Namespace namespace = root.getNamespace();
      String prefix = namespace.getPrefix();
      String uri = namespace.getURI();
      if (!uri.equals("http://www.w3.org/1999/xhtml")) {
        valid = false;
        System.out.println(source 
         + " does not properly declare the"
         + " http://www.w3.org/1999/xhtml namespace"
         + " on the root element");        
      }
      if (!prefix.equals("")) {
        valid = false;
        System.out.println(source 
         + " does not use the empty prefix for XHTML");        
      }
      
      if (valid) System.out.println(source + " is valid XHTML.");
    
  }

}

Here’s the result of running this program on the XHTML 1.0 specification:

D:\books\XMLJAVA>java XHTMLValidator http://www.w3.org/TR/xhtml1/
http://www.w3.org/TR/xhtml1/ is valid XHTML.

As one would hope, it proves valid.

Prev	Up	Next
Namespaces	Home	The EntityRef Class

Copyright 2001, 2002 Elliotte Rusty Harold	elharo@metalab.unc.edu	Last Modified August 11, 2002
	Up To Cafe con Leche