Receiving Skipped Entities

Validating parsers resolve all general entity references that occur in both element content and attribute values. However, non-validating parsers are allowed not to read the external DTD subset. For example, consider the simple XHTML document in Example 6.14.

Example 6.14. An XML document containing a potentially skipped entity reference

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
                    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
  <body>
     <h1>My resum&eacute;</h1>
  </body> 
</html>

If a parser does not read the DTD, then it has no way of knowing what the entity reference &eacute; stands for, or indeed whether that entity reference is even properly defined. However, such a non-validating parser will assume that the entity reference is defined in the external DTD subset it didn’t read; but rather than reporting the replacement text for that entity, it reports a skipped entity using the skippedEntity() callback method:

public void skippedEntity(String name)
    throws SAXException;

For example, according to the XHTML 1.0 specification, if a User Agent such as a browser “encounters an entity reference (other than one of the predefined HTML entities) for which the User Agent has processed no declaration (which could happen if the declaration is in the external subset which the User Agent hasn’t read), the entity reference should be processed as the characters (starting with the ampersand and ending with the semi-colon) that make up the entity reference.” In other words, rather than rendering &prescription_take; as the symbol , the browser is supposed to draw it as simply &prescription_take;. If you were writing an XHTML browser that did not validate but did require full conformance to XHTML 1.0, you would probably implement the skippedEntity() method by passing an ampersand, the name of the entity reference, and a semicolon to the characters() method in the same content handler like this:

  public void skippedEntity(String name) 
   throws SAXException {
     
    StringBuffer sb = new StringBuffer();
    sb.append('&');
    sb.append(name);
    sb.append(';');
    char[] text = new char[sb.length()];
    sb.getChars(0, sb.length(), text, 0) 
    this.characters(text, 0, text.length);
    
  }

Skipped entities can also appear in attribute values. For example,

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
                    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
  <body>
     <div purpose="resum&eacute;">
     ...
     </div>
  </body> 
</html>

This is one of the few holes in SAX. The parser will not report such an entity to you. The value it assigns to the attribute is calculated by simply deleting the entity reference. In this example, the value of the purpose attribute would be reported as “resum” if the parser does not read the DTD.


Copyright 2001, 2002 Elliotte Rusty Haroldelharo@metalab.unc.eduLast Modified April 30, 2002
Up To Cafe con Leche