Chapter 17 of the XML Bible, Second Edition : XSL Transformations

In This Chapter

The Extensible Stylesheet Language (XSL) includes both a transformation language and a formatting language. Each of these, naturally enough, is an XML application. The transformation language provides elements that define rules for how one XML document is transformed into another XML document. The transformed XML document may use the markup and DTD of the original document, or it may use a completely different set of elements. In particular, it may use the elements defined by the second part of XSL, the formatting objects. This chapter discusses the transformation language half of XSL.

What Is XSL?

The transformation and formatting halves of XSL can function independently of each other. For instance, the transformation language can transform an XML document into a well-formed HTML file, and completely ignore XSL formatting objects. This is the style of XSL previewed in Chapter 5 and emphasized in this chapter. Furthermore, it's not absolutely required that a document written in XSL formatting objects be produced by using the transformation part of XSL on another XML document. For example, it's easy to imagine a converter written in Java that reads TeX or PDF files and translates them into XSL formatting objects (though no such converters exist as of early 2001).

In essence, XSL is two languages, not one. The first language is a transformation language, the second a formatting language. The transformation language is useful independent of the formatting language. Its ability to move data from one XML representation to another makes it an important component of XML-based electronic commerce, electronic data interchange, metadata exchange, and any application that needs to convert between different XML representations of the same data. These uses are also united by their lack of concern with rendering data on a display for humans to read. They are purely about moving data from one computer system or program to another.

Consequently, many early implementations of XSL focus exclusively on the transformation part and ignore the formatting objects. These are incomplete implementations, but nonetheless useful. Not all data must ultimately be rendered on a computer monitor or printed on paper.

Cross-Reference

Chapter 18 discusses the XSL formatting language.

A Word of Caution about XSL

XSL is still under development. The language has changed radically in the past, and will almost certainly change again in the future. This chapter is based on the November 16, 1999 XSLT 1.0 Recommendation. Because XSLT is now an official Recommendation of the World Wide Web Consortium (W3C), I'm hopeful that any changes that do occur will simply add to the existing syntax without invalidating style sheets that adhere to the 1.0 spec. Indeed the W3C has just begun work on XSLT 1.1 and 2.0, and it does seem likely that all legal XSLT 1.0 documents will still be legal XSLT 1.1 and 2.0 documents.

Not all software has caught up to the 1.0 Recommendation, however. In particular, Version 5.5 and earlier of Internet Explorer only implement a very old working draft of XSLT that looks almost nothing like the finished standard. You should not expect most of the examples in this chapter to work with IE, even after substantial tweaking. Conversely, the language that IE does implement is not XSLT; and any book or person that tells you otherwise is telling you an untruth. Both Microsoft's live presentations and the written documentation it posts on its Web site are notorious for teaching nonstandard Microsoft versions of XSLT (and other languages) without clearly distinguishing which parts are real XSLT and which are Microsoft extensions to (some would say perversions of) standard XSLT.

In November 2000 Microsoft released MSXML 3.0, an XML parser/XSLT processor for IE that does come much closer to supporting XSLT 1.0. You can download it from http://msdn.microsoft.com/xml/general/xmlparser.asp. However, there are still some bugs and areas where Microsoft did not follow the specification, so this is not quite a complete implementation of XSLT 1.0. More importantly, MSXML 3.0 is not bundled with IE5.5; and even if you install it, it does not automatically replace the earlier, non-standard-compliant version of MSXML that is bundled. To replace the old version, you have to download and run a separate program called xmlinst.exe, which you can get from the same page where you found MSXML 3.0.

Overview of XSL Transformations

In an XSL transformation, an XSLT processor reads both an XML document and an XSLT style sheet. Based on the instructions the processor finds in the XSLT style sheet, it outputs a new XML document or fragment thereof. There's also special support for outputting HTML. With some effort most XSLT processors can also be made to output essentially arbitrary text, though XSLT is designed primarily for XML-to-XML and XML-to-HTML transformations.

Trees

As you learned in Chapter 6, every well-formed XML document is a tree. A tree is a data structure composed of connected nodes beginning with a top node called the root. The root is connected to its child nodes, each of which is connected to zero or more children of its own, and so forth. Nodes that have no children of their own are called leaves. A diagram of a tree looks much like a genealogical descendant chart that lists the descendants of a single ancestor. The most useful property of a tree is that each node and its children also form a tree. Thus, a tree is a hierarchical structure of trees in which each tree is built out of smaller trees.

For the purposes of XSLT, elements, attributes, namespaces, processing instructions, and comments are counted as nodes. Furthermore, the root of the document must be distinguished from the root element. Thus, XSLT processors model an XML document as a tree that contains seven kinds of nodes:

The Document Type Definition (DTD) and document type declaration are specifically not included in this tree. However, a DTD may add default attribute values to some elements, which then become additional attribute nodes in the tree.

For example, consider the XML document in Listing 17-1. This shows part of the periodic table of the elements. I’ll be using this as an example in this chapter.

On the CD-ROM

The complete periodic table appears on the CD-ROM in the file allelements.xml in the examples/periodic_table directory.

The root PERIODIC_TABLE element contains ATOM child elements. Each ATOM element contains several child elements providing the atomic number, atomic weight, symbol, boiling point, and so forth. A UNITS attribute specifies the units for those elements that have units.

Note

ELEMENT would be a more appropriate name here than ATOM. However, writing about ELEMENT elements and trying to distinguish between chemical elements and XML elements might create confusion. Thus, at least for the purposes of this chapter, ATOM seemed like the more legible option.

Listing 17-1: An XML periodic table with two atoms: hydrogen and helium

<?xml version="1.0"?>
<?xml-stylesheet type="text/xml" href="17-2.xsl"?>
<PERIODIC_TABLE>
  <ATOM STATE="GAS">
    <NAME>Hydrogen</NAME>
    <SYMBOL>H</SYMBOL>
    <ATOMIC_NUMBER>1</ATOMIC_NUMBER>
    <ATOMIC_WEIGHT>1.00794</ATOMIC_WEIGHT>
    <BOILING_POINT UNITS="Kelvin">20.28</BOILING_POINT>
    <MELTING_POINT UNITS="Kelvin">13.81</MELTING_POINT>
    <DENSITY UNITS="grams/cubic centimeter">
      <!-- At 300K, 1 atm -->
      0.0000899
    </DENSITY>
  </ATOM>
  <ATOM STATE="GAS">
    <NAME>Helium</NAME>
    <SYMBOL>He</SYMBOL>
    <ATOMIC_NUMBER>2</ATOMIC_NUMBER>
    <ATOMIC_WEIGHT>4.0026</ATOMIC_WEIGHT>
    <BOILING_POINT UNITS="Kelvin">4.216</BOILING_POINT>
    <MELTING_POINT UNITS="Kelvin">0.95</MELTING_POINT>
    <DENSITY UNITS="grams/cubic centimeter"><!-- At 300K -->
      0.0001785
    </DENSITY>
  </ATOM>
</PERIODIC_TABLE>

Figure 17-1 displays a tree diagram of this document. It begins at the top with the root node (not the same as the root element!) which contains two child nodes, the xml-stylesheet processing instruction and the root element PERIODIC_TABLE. (The XML declaration is not visible to the XSLT processor and is not included in the tree the XSLT processor operates on.) The PERIODIC_TABLE element contains two child nodes, both ATOM elements. Each ATOM element has an attribute node for its STATE attribute, and a variety of child element nodes. Each child element contains a node for its contents, as well as nodes for any attributes, comments and processing instructions it possesses. Notice in particular that many nodes are something other than elements. There are nodes for text, attributes, comments, namespaces and processing instructions. Unlike CSS, XSL is not limited to working only with whole elements. It has a much more granular view of a document that enables you to base styles on comments, attributes, processing instructions, element content, and more.

Note

Like the XML declaration, an internal DTD subset or DOCTYPE declaration is not part of the tree. However, it may have the effect of adding attribute nodes to some elements through <!ATTLIST> declarations that use #FIXED or default attribute values.

Image: 17/picture1.jpg

Figure 17-1: Listing 17-1 as a tree diagram

XSLT operates by transforming one XML tree into another XML tree. More precisely, an XSLT processor accepts as input a tree represented as an XML document and produces as output a new tree, also represented as an XML document. Consequently, the transformation part of XSL is also called the tree construction part. The XSL transformation language contains operators for selecting nodes from the tree, reordering the nodes, and outputting nodes. If one of these nodes is an element node, then it may be an entire tree itself. Remember that all these operators, both for input and output, are designed for operation on a tree.

The input must be an XML document. You cannot use XSLT to transform from non-XML formats such as PDF, TeX, Microsoft Word, PostScript, MIDI, or others. HTML and SGML are borderline cases because they're so close to XML. XSLT can work with HTML and SGML documents that satisfy XML's well-formedness rules. However, XSLT cannot handle the wide variety of non-well-formed HTML and SGML that you encounter on most Web sites and document production systems. XSLT is not a general-purpose regular expression language for transforming arbitrary data.

Most of the time the output of an XSLT transformation is also an XML document. However, it can also be a result tree fragment that could be used as an external parsed entity in another XML document. (That is, it would be a well-formed XML document if it were enclosed in a single root element.) In other words, the output may not necessarily be a well-formed XML document, but it will at least be a plausible part of a well-formed XML document. An XSLT transformation cannot output text that is malformed XML such as

<B><I>Tag Mismatch!</B></I>

Tip

The xsl:output element and disable-output-escaping attribute discussed below loosen this restriction somewhat.

Most XSLT processors also support output as HTML and/or raw text, although the standard does not require them to do so. To some extent this allows you to transform to non-XML formats like TeX, RTF, or PostScript. However XSLT is not designed to make these transformations easy. It is designed for XML-to-XML transformations. If you need a non-XML output format, it will probably be easier to use XSLT to transform the XML to an intermediate format like TeXML (http://www.alphaworks.ibm.com/tech/texml), and then use additional, non-XSLT software to transform that into the format you want.

XSLT style sheet documents

An XSLT document contains template rules. A template rule has a pattern specifying the nodes it matches and a template to be instantiated and output when the pattern is matched. When an XSLT processor transforms an XML document using an XSL style sheet, it walks the XML document tree, looking at each node in turn. As each node in the XML document is read, the processor compares it with the pattern of each template rule in the style sheet. When the processor finds a node that matches a template rule's pattern, it outputs the rule's template. This template generally includes some markup, some new data, and some data copied out of the source XML document.

XSLT uses XML to describe these rules, templates, and patterns. The root element of the XSLT document is either a stylesheet or a transform element in the http://www.w3.org/1999/XSL/Transform namespace. By convention this namespace is mapped to the xsl prefix, but you're free to pick another prefix if you prefer. In this chapter, I always use the xsl prefix. From this point forward it should be understood that the prefix xsl is mapped to the http://www.w3.org/1999/XSL/Transform namespace.

Tip

If you get the namespace URI wrong, either by using a URI from an older draft of the specification, such as http://www.w3.org/TR/WD-xsl, or simply by making a typo in the normal URI, the XSLT processor will output the style sheet document itself instead of the transformed input document. This is the result of the interaction between several obscure sections of the XSLT 1.0 specification. The details aren’t important. What is important is that this very unusual behavior looks very much like a bug in the processor if you aren’t familiar with it. If you are familiar with it, fixing it is trivial; just correct the namespace URI to http://www.w3.org/1999/XSL/Transform.

Each template rule is an xsl:template element. The pattern of the rule is placed in the match attribute of the xsl:template element. The output template is the content of the xsl:template element. All instructions in the template for doing things such as selecting parts of the input tree to include in the output tree are performed by one or another XSLT elements. These are identified by the xsl: prefix on the element names. Elements that do not have an xsl: prefix are part of the result tree.

Listing 17-2 shows a very simple XSLT style sheet with two template rules. The first template rule matches the root element PERIODIC_TABLE. It replaces this element with an html element. The contents of the html element are the results of applying the other templates in the document to the contents of the PERIODIC_TABLE element.

The second template matches ATOM elements. It replaces each ATOM element in the input document with a P element in the output document. The xsl:apply-templates rule inserts the text of the matched source element into the output document. Thus, the contents of a P element will be the text (but not the markup) contained in the corresponding ATOM element.

The xsl:stylesheet root element has two required attributes, version and xmlns:xsl, each of which must have exactly the values shown here (1.0 for version and http://www.w3.org/1999/XSL/Transform for xmlns:xsl). I'll discuss the exact syntax of all these elements and attributes below.

Listing 17-2: An XSLT style sheet for the periodic table with two template rules

<?xml version="1.0"?>
<xsl:stylesheet version="1.0" 
          xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:template match="PERIODIC_TABLE">
    <html>
      <xsl:apply-templates/>
    </html>
  </xsl:template>
  <xsl:template match="ATOM">
    <P>
      <xsl:apply-templates/>
    </P>
  </xsl:template>
</xsl:stylesheet>

The xsl:transform element can be used in place of xsl:stylesheet if you prefer. This is an exact synonym with the same syntax, semantics, and attributes. For example,

<?xml version="1.0"?>
<xsl:transform version="1.0" 
         xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <!-- templates go here -->
</xsl:transform>

In this book, I will stick to xsl:stylesheet.

Where does the XML transformation happen?

There are three primary ways to transform XML documents into other formats, such as HTML, with an XSLT style sheet:

Each of these three approaches uses different software, although they all use the same XML documents and XSLT style sheets. An ordinary Web server sending XML documents to Internet Explorer is an example of the first approach. A servlet-compatible Web server using the IBM alphaWorks' XML Enabler (http://www.alphaworks.ibm.com/tech/xmlenabler) is an example of the second approach. A human using Michael Kay's command line SAXON program (http://users.iclway.co.uk/mhkay/saxon/) to transform XML documents to HTML documents, then placing the HTML documents on a Web server is an example of the third approach. However, these all use (at least in theory) the same XSLT language.

In this chapter, I emphasize the third approach, primarily because at the time of this writing, specialized converter programs such as Michael Kay's SAXON and the XML Apache Project's Xalan (http://xml.apache.org/xalan/) provide the most complete and accurate implementations of the XSLT specification. Furthermore, this approach offers the broadest compatibility with legacy Web browsers and servers, whereas the first approach requires a more recent browser than most users use, and the second approach requires special Web server software. In practice, though, requiring a different server is not nearly as onerous as requiring a particular client. You, yourself, can install your own special server software; but you cannot rely on your visitors to install particular client software.

On the CD-ROM

Xalan is on the CD-ROM in the directory utilities/xalan. SAXON is on the CD-ROM in the directory utilities/saxon.

How to use Xalan

Xalan is a Java 1.1 character mode application. To use it, you'll need a Java 1.1-compatible virtual machine such as Sun's Java Development Kit (JDK), or Java Runtime Environment (JRE), Apple's Macintosh Runtime for Java 2.2 (MRJ), or Microsoft's virtual machine. You'll need to set your CLASSPATH environment variable to include both the xalan.jar and xerces.jar files (both included in the Xalan distribution). On Unix/Linux you can set this in your .cshrc file if you use csh or tcsh or in your .profile file if you use sh, ksh or bash. On Windows 95/98 you can set it in AUTOEXEC.BAT. In Windows NT/2000, set it with the System Control Panel Environment tab.

Tip

If you're using the JRE 1.2 or later, you can just put the xalan.jar and xerces.jar files in your jre/lib/ext directory instead of mucking around with the CLASSPATH environment variable. If you've installed the JDK instead of the JRE on Windows, you may have two jre/lib/ext directories, one somewhere like C:\jdk1.3\jre\lib\ext and the other somewhere like C:\Program Files\Javasoft\jre\1.3\lib\ext. You need to copy the jar archive into both ext directories. Putting one copy in one directory and an alias into the other directory does not work. You must place complete, actual copies into each ext directory.

Note

Although I primarily use Xalan in this chapter, the examples should work with SAXON or any other XSLT processor that implements the November 16, 1999 XSLT 1.0 recommendation.

The Java class containing the main method for Xalan is org.apache.xalan.xslt.Process. You can run Xalan by typing the following at the shell prompt or in a DOS window:

C:\> java org.apache.xalan.xslt.Process -in 17-1.xml -xsl 17-2.xsl -out 17-3.html

This line runs the java interpreter on the Java class containing the Xalan program's main() method, org.apache.xalan.xslt.Process. The source XML document following the -in flag is 17-1.xml. The XSLT style sheet follows the -xsl flag and is 17-2.xsl here; and the output HTML file follows the -out argument and is named 17-3.html. If the -out argument is omitted, the transformed document will be printed on the console. If the -xsl argument is omitted, Xalan will attempt to use the style sheet named by the xml-stylesheet processing instruction in the prolog of the input XML document.

Listing 17-2 transforms input documents to well-formed HTML files as discussed in Chapter 6. However, you can transform from any XML application to any other as long as you can write a style sheet to support the transformation. For example, you can imagine a style sheet that transforms from Vector Markup Language (VML) documents to Scalable Vector Graphics (SVG) documents:

% java org.apache.xalan.xslt.Process -in pinktriangle.vml
  -xsl VmlToSVG.xsl -out pinktriangle.svg

Most other command line XSLT processors behave similarly, though of course they'll have different command line arguments and options. They may prove slightly easier to use if they're not written in Java since there won't be any need to configure the CLASSPATH.

Tip

If you're using Windows, you can use a stand-alone executable version of SAXON called Instant SAXON (http://users.iclway.co.uk/mhkay/saxon/instant.html) instead. This is a little easier to use because it doesn't require you to mess around with CLASSPATH environment variables. To transform a document with this program, simply place the saxon.exe file in your path and type:

C:\> saxon -o 17-3.html 17-1.xml 17-2.xsl

Listing 17-3 shows the output of running Listing 17-1 through Xalan with the XSLT style sheet in Listing 17-2. Notice that Xalan does not attempt to clean up the HTML it generates, which has a lot of white space. This is not important since ultimately you want to view the file in a Web browser that trims white space. Figure 17-2 shows Listing 17-3 loaded into Netscape Navigator 4.6. Because Listing 17-3 is standard HTML, you don't need an XML-capable browser to view it.

Listing 17-3: The HTML produced by applying the style sheet in Listing 17-2 to the XML in Listing 17-1

<html>
  <P>
    Hydrogen
    H
    1
    1.00794
    20.28
    13.81
      0.0000899
  </P>
  <P>
    Helium
    He
    2
    4.0026
    4.216
    0.95
      0.0001785
  </P>
</html>
Image: 17/picture3.jpg

Figure 17-2: The page produced by applying the style sheet in Listing 17-2 to the XML document in Listing 17-1.

Direct display of XML files with XSLT style sheets

Instead of preprocessing the XML file, you can send the client both the XML file and the XSLT file that describes how to render it. The client is responsible for applying the style sheet to the document and rendering it accordingly. This is more work for the client, but places much less load on the server. In this case, the XSLT style sheet must transform the document into an XML application the client understands. HTML is a likely choice, though in the future some browsers may understand XSL formatting objects as well.

Attaching an XSLT style sheet to an XML document is easy. Simply insert an xml-stylesheet processing instruction in the prolog immediately after the XML declaration. This processing instruction should have a type attribute with the value text/xml and an href attribute whose value is a URL pointing to the style sheet. For example:

<?xml version="1.0"?>
<?xml-stylesheet type="text/xml" href="17-2.xsl"?>

This is also how you attach a CSS style sheet to a document. The only difference here is that the type attribute has the value text/xml instead of text/css.

Note

In the future the more specific MIME media type application/xslt+xml will be available to distinguish XSLT documents from all other XML documents. Once XSLT processors are revised to support this, you will be able to write the xml-stylesheet processing instruction like this instead:

<?xml-stylesheet type="application/xslt+xml" href="17-2.xsl"?>

Internet Explorer 5.0 and 5.5's XSLT support differs from the November 16, 1999 recommendation in several ways. First, it expects that XSLT elements live in the http://www.w3.org/TR/WD-xsl namespace instead of the http://www.w3.org/1999/XSL/Transform namespace, although the xsl prefix is still used. Second, it expects the non-standard MIME type text/xsl in the xml-stylesheet processing instruction rather than text/xml. Finally, it does not implement the default rules for elements that match no template. Consequently, you need to provide a template for each element in the hierarchy starting from the root before trying to view a document in Internet Explorer. Listing 17-4 demonstrates. The three rules match the root node, the root element PERIODIC_TABLE, and the ATOM elements in that order. Figure 17-3 shows the XML document in Listing 17-1 loaded into Internet Explorer 5.5 with this style sheet.

Listing 17-4: The style sheet of Listing 17-2 adjusted to work with Internet Explorer 5.0 and 5.5

<?xml version="1.0"?>
<!-- This is a non-standard style sheet designed just for
     Internet Explorer. It will not work with any standards
     compliant XSLT processor. -->
<xsl:stylesheet version="1.0" 
  xmlns:xsl="http://www.w3.org/TR/WD-xsl">
  <xsl:template match="/">
    <html>
      <xsl:apply-templates/>
    </html>
  </xsl:template>
  <xsl:template match="PERIODIC_TABLE">
    <xsl:apply-templates/>
  </xsl:template>
  <xsl:template match="ATOM">
    <P>
      <xsl:value-of select="."/>
    </P>
  </xsl:template>
</xsl:stylesheet>

Caution

Ideally, you would use the same XML document both for direct display and for prerendering to HTML. Unfortunately, that would require Microsoft to actually support the real XSLT specification. Microsoft has repeatedly promised to support this, and they have just as repeatedly reneged on those promises.

Image: 17/picture5.jpg

Figure 17-3: The page produced in Internet Explorer 5.5 by applying the style sheet in Listing 17-4 to the XML document in Listing 17-1.

Internet Explorer also fails to support many other parts of standard XSLT, while offering a number of nonstandard extensions. If you've successfully installed MSXML3 in replace mode, then IE5 can handle most of XSLT 1.0 including the http://www.w3.org/1999/XSL/Transform namespace. However, even this version still has a few bugs, including expecting the text/xsl MIME type instead of text/xml. In the rest of this chapter, I use only standard XSLT and simply prerender the file in HTML before loading it into a Web browser. If you find something in this chapter doesn’t work in Internet Explorer, please complain to Microsoft, not to me.

XSL Templates

Template rules defined by xsl:template elements are the most important part of an XSLT style sheet. These associate particular output with particular input. Each xsl:template element has a match attribute that specifies which nodes of the input document the template is instantiated for.

The content of the xsl:template element is the actual template to be instantiated. A template may contain both text that will appear literally in the output document and XSLT instructions that copy data from the input XML document to the result. Because all XSLT instructions are in the http://www.w3.org/1999/XSL/Transform namespace, it's easy to distinguish between the elements that are literal data to be copied to the output and instructions. For example, here is a template that is applied to the root node of the input tree:

<xsl:template match="/">
  <html>
    <head>
    </head>
    <body>
    </body>
  </html>
</xsl:template>

When the XSLT processor reads the input document, the first node it sees is the root. This rule matches that root node, and tells the XSLT processor to emit this text:

<html>
  <head>
  </head>
  <body>
  </body>
</html>

This text is well-formed HTML. Because the XSLT document is itself an XML document, its contents — templates included — must be well-formed XML.

If you were to use the above rule, and only the above rule, in an XSLT style sheet, the output would be limited to the above six tags. That's because no instructions in the rule tell the formatter to move down the tree and look for further matches against the templates in the style sheet.

The xsl:apply-templates element

To get beyond the root, you have to tell the formatting engine to process the children of the root. In general, to include content in the child nodes, you have to recursively process the nodes through the XML document. The element that does this is xsl:apply-templates. By including xsl:apply-templates in the output template, you tell the formatter to compare each child element of the matched source element against the templates in the style sheet, and, if a match is found, output the template for the matched node. The template for the matched node may itself contain xsl:apply-templates elements to search for matches for its children. When the formatting engine processes a node, the node is treated as a complete tree. This is the advantage of the tree structure. Each part can be treated the same way as the whole. For example, Listing 17-5 is an XSLT style sheet that uses the xsl:apply templates element to process the child nodes.

Listing 17-5: An XSLT style sheet that recursively processes the children of the root

<?xml version="1.0"?>
<xsl:stylesheet version="1.0" 
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:template match="/">
    <html>
      <xsl:apply-templates/>
    </html>
  </xsl:template>
  <xsl:template match="PERIODIC_TABLE">
    <body>
      <xsl:apply-templates/>
    </body>
  </xsl:template>
  <xsl:template match="ATOM">
    An Atom
  </xsl:template>
</xsl:stylesheet>

When this style sheet is applied to Listing 17-1, here's what happens:

 A. The first child of the root, the xml-stylesheet processing instruction, is compared with the template rules. It doesn't match any of them, so no output is generated.

 B. The second child of the root node of the input document, the root element PERIODIC_TABLE, is compared with the template rules. It matches the second template rule.

 C. The <body> tag is written out.

 D. The xsl:apply-templates element in the body element causes the formatting engine to process the child nodes of PERIODIC_TABLE.

a. The first child of the PERIODIC_TABLE element, that is the Hydrogen ATOM element, is compared with the template rules. It matches the third template rule.

b. The text "An Atom" is output.

c. The second child of the PERIODIC_TABLE element, that is the Helium ATOM element, is compared with the template rules. It matches the third template rule.

d. The text "An Atom" is output.

 E. The </body> tag is written out.

The end result is:

<html>
<body>
    An Atom
    An Atom
</body>
</html>

The select attribute

To replace the text "An Atom" with the name of the ATOM element as given by its NAME child, you need to specify that templates should be applied to the NAME children of the ATOM element. To choose a particular set of children instead of all children you supply xsl:apply-templates with a select attribute designating the children to be selected. For example:

<xsl:template match="ATOM">
  <xsl:apply-templates select="NAME"/>
</xsl:template>

The select attribute uses the same kind of patterns as the match attribute of the xsl:template element. For now, I'll stick to simple names of elements; but in the section on patterns for matching and selecting later in this chapter, you'll see many more possibilities for both select and match. If no select attribute is present, all child element, text, comment, and processing instruction nodes are selected. (Attribute and namespace nodes are not selected.)

The result of adding this rule to the style sheet of Listing 17-5 and applying it to Listing 17-1 is this:

<html>
<body>
  Hydrogen
  Helium
</body>
</html>

Computing the Value of a Node with xsl:value-of

The xsl:value-of element computes the value of something (most of the time, though not always, something in the input document) and copies it into the output document. The select attribute of the xsl:value-of element specifies exactly which something's value is being computed.

For example, suppose you want to replace the literal text An Atom with the name of the ATOM element as given by the contents of its NAME child. You can replace An Atom with <xsl:value-of select="NAME"/> like this:

<xsl:template match="ATOM">
  <xsl:value-of select="NAME"/>
</xsl:template>

Then, when you apply the style sheet to Listing 17-1, this text is generated:

<html>
<body>
  Hydrogen
  Helium
</body>
</html>

The item whose value is selected, the NAME element in this example, is relative to the current node. The current node is the item matched by the template, the particular ATOM element in this example. Thus, when the Hydrogen ATOM is matched by <xsl:template match="ATOM">, the Hydrogen ATOM's NAME is selected by xsl:value-of. When the Helium ATOM is matched by <xsl:template match="ATOM">, the Helium ATOM's NAME is selected by xsl:value-of.

The value of a node is always a string, possibly an empty string. The exact contents of this string depend on the type of the node. The most common type of node is element, and the value of an element node is particularly simple. It's the concatenation of all the character data (but not markup!) between the element's start tag and end tag. For example, the first ATOM element in Listing 17-1 is as follows:

  <ATOM STATE="GAS">
    <NAME>Hydrogen</NAME>
    <SYMBOL>H</SYMBOL>
    <ATOMIC_NUMBER>1</ATOMIC_NUMBER>
    <ATOMIC_WEIGHT>1.00794</ATOMIC_WEIGHT>
    <BOILING_POINT UNITS="Kelvin">20.28</BOILING_POINT>
    <MELTING_POINT UNITS="Kelvin">13.81</MELTING_POINT>
    <DENSITY UNITS="grams/cubic centimeter">
      <!-- At 300K, 1 atm -->
      0.0000899
    </DENSITY>
  </ATOM>

The value of this element is shown below:

  Hydrogen
  H
  1
  1.00794
  1
  20.28
  13.81
    0.0000899

I calculated this value by stripping out all the tags and comments. Everything else including white space was left intact. The values of the other six node types are calculated similarly, mostly in obvious ways. Table 17-1 summarizes.

Table 17-1: Values of Nodes

Node Type:

Value:

Root

The value of the root element

Element

The concatenation of all parsed character data contained in the element, including character data in any of the descendants of the element

Text

The text of the node; essentially the node itself

Attribute

The normalized attribute value as specified by Section 3.3.3 of the XML 1.0 recommendation; basically the attribute value after entities are resolved and leading and trailing white space is stripped; does not include the name of the attribute, the equals sign, or the quotation marks

Namespace

The URI of the namespace

Processing instruction

The data in the processing instruction; does not include the processing instruction , <? or ?>

Comment

The text of the comment, <!-- and --> not included

Processing Multiple Elements with xsl:for-each

The xsl:value-of element should only be used in contexts where it is obvious which node's value is being taken. If there are multiple possible items that could be selected, then only the first one will be chosen. For instance, this is a poor rule because a typical PERIODIC_TABLE element contains more than one ATOM:

<xsl:template match="PERIODIC_TABLE">
  <xsl:value-of select="ATOM"/>
</xsl:template>

There are two ways of processing multiple elements in turn. The first method you've already seen. Simply use xsl:apply-templates with a select attribute that chooses the particular elements that you want to include, like this:

  <xsl:template match="PERIODIC_TABLE">
    <xsl:apply-templates select="ATOM"/>
  </xsl:template>
  <xsl:template match="ATOM">
    <xsl:value-of select="."/>
  </xsl:template>

The select="." in the second template tells the formatter to take the value of the matched element, ATOM in this example.

The second option is xsl:for-each. The xsl:for-each element processes each element chosen by its select attribute in turn. However, no additional template is required. For example:

<xsl:template match="PERIODIC_TABLE">
  <xsl:for-each select="ATOM">
    <xsl:value-of select="."/>
  </xsl:for-each>
</xsl:template>

Patterns for Matching Nodes

The match attribute of the xsl:template element supports a complex syntax that allows you to express exactly which nodes you do and do not want to match. The select attribute of xsl:apply-templates, xsl:value-of, xsl:for-each, xsl:copy-of, and xsl:sort supports an even more powerful superset of this syntax called Xpath that allows you to express exactly which nodes you do and do not want to select. Various patterns for matching and selecting nodes are discussed below.

Matching the root node

In order that the output document be well-formed, the first thing output from an XSL transformation should be the output document's root element. Consequently, XSLT style sheets generally start with a rule that applies to the root node. To specify the root node in a rule, you give its match attribute the value "/". For example:

<xsl:template match="/">
  <DOCUMENT>
    <xsl:apply-templates/>
  </DOCUMENT>
</xsl:template>

This rule applies to the root node and only the root node of the input tree. When the root node is read, the tag <DOCUMENT> is output, the children of the root node are processed, then the </DOCUMENT> tag is output. This rule overrides the default rule for the root node. Listing 17-6 shows a style sheet with a single rule that applies to the root node.

Listing 17-6: An XSLT style sheet with one rule for the root node

<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:template match="/">
      <html>
        <head>
          <title>Atomic Number vs. Atomic Weight</title>
        </head>
        <body>
          <table>
            Atom data will go here
          </table>
        </body>
      </html>
    </xsl:template>
</xsl:stylesheet>

Because this style sheet only provides a rule for the root node, and because that rule's template does not specify any further processing of child nodes, only literal output that's included in the template is inserted in the resulting document. In other words, the result of applying the style sheet in Listing 17-6 to Listing 17-1 (or any other well-formed XML document) is this:

<html>
<head>
<title>Atomic Number vs. Atomic Weight</title>
</head>
<body>
<table>
            Atom data will go here
          </table>
</body>
</html>

Matching element names

As previously mentioned, the most basic pattern contains a single element name that matches all elements with that name. For example, this template matches ATOM elements and makes their ATOMIC_NUMBER children bold:

<xsl:template match="ATOM">
  <b><xsl:value-of select="ATOMIC_NUMBER"/></b>
</xsl:template>

Listing 17-7 demonstrates a style sheet that expands on Listing 17-6. First, an xsl:apply-templates element is included in the template rule for the root node. This rule uses a select attribute to ensure that only PERIODIC_TABLE elements are processed.

Second, a rule that only applies to PERIODIC_TABLE elements is created using match="PERIODIC_TABLE". This rule sets up the header for the table, and then applies templates to form the body of the table from ATOM elements.

Finally, the ATOM rule specifically selects the ATOM element's NAME, ATOMIC_NUMBER, and ATOMIC_WEIGHT child elements with <xsl:value-of select="NAME"/>, <xsl:value-of select="ATOMIC_NUMBER"/>, and <xsl:value-of select="ATOMIC_WEIGHT"/>. These are wrapped up inside HTML's tr and td elements, so that the end result is a table of atomic numbers matched to atomic weights. Figure 17-4 shows the output of applying the style sheet in Listing 17-7 to the complete periodic table document displayed in Netscape Navigator.

One thing you may wish to note about this style sheet: The exact order of the NAME, ATOMIC_NUMBER, and ATOMIC_WEIGHT elements in the input document is irrelevant. They appear in the output in the order they were selected; that is, first number, then weight. Conversely, the individual atoms are sorted in alphabetical order as they appear in the input document. Later, you'll see how to use an xsl:sort element to change that so you can arrange the atoms in the more conventional atomic number order.

Listing 17-7: Templates applied to specific classes of element with select

<?xml version="1.0"?>
<xsl:stylesheet version="1.0" 
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:template match="/">
      <html>
        <head>
          <title>Atomic Number vs. Atomic Weight</title>
        </head>
        <body>
          <xsl:apply-templates select="PERIODIC_TABLE"/>
        </body>
      </html>
    </xsl:template>
    <xsl:template match="PERIODIC_TABLE">
      <h1>Atomic Number vs. Atomic Weight</h1>
      <table>
        <th>Element</th>
        <th>Atomic Number</th>
        <th>Atomic Weight</th>
          <xsl:apply-templates select="ATOM"/>
      </table>
    </xsl:template>
    <xsl:template match="ATOM">
      <tr>
        <td><xsl:value-of select="NAME"/></td>
        <td><xsl:value-of select="ATOMIC_NUMBER"/></td>
        <td><xsl:value-of select="ATOMIC_WEIGHT"/></td>
      </tr>
    </xsl:template>
</xsl:stylesheet>
Image: 17/picture7.jpg

Figure 17-4: A table showing atomic number versus atomic weight in Netscape Navigator

Wild cards

Sometimes you want a single template to apply to more than one element. You can indicate that a template matches all elements by using the asterisk wildcard (*) in place of an element name in the match attribute. For example this template says that all elements should be wrapped in a P element:

    <xsl:template match="*">
      <P>
        <xsl:value-of select="."/>
      </P>
    </xsl:template>

Of course this is probably more than you want. You’d like to use the template rules already defined for PERIODIC_TABLE and ATOM elements as well as the root node and only use this rule for the other elements. Fortunately you can. In the event that two rules both match a single node, then by default the more specific one takes precedence. In this case that means that ATOM elements will use the template with match="ATOM" instead of a template that merely has match="*". However, NAME, BOILING_POINT, ATOMIC_NUMBER and other elements that don’t match a more specific template will cause the match="*" template to activate.

You can place a namespace prefix in front of the asterisk to indicate that only elements in a particular namespace should be matched. For example this template matches all SVG elements, presuming that the prefix svg is mapped to the normal SVG URI http://www.w3.org/2000/svg in the style sheet.

    <xsl:template match="svg:*">
      <DIV>
        <xsl:value-of select="."/>
      </DIV>
    </xsl:template>

Of course in Listing 17-1, there aren't any elements from this namespace, so this template wouldn’t produce any output. However, it might when applied to a different document that did include some SVG.

Matching children with /

You're not limited to the children of the current node in match attributes. You can use the / symbol to match specified hierarchies of elements. Used alone, the / symbol refers to the root node. However, you can use it between two names to indicate that the second is the child of the first. For example, ATOM/NAME refers to NAME elements that are children of ATOM elements.

In xsl:template elements, this enables you to match only some of the elements of a given kind. For example, this template rule marks SYMBOL elements that are children of ATOM elements strong. It does nothing to SYMBOL elements that are not direct children of ATOM elements.

<xsl:template match="ATOM/SYMBOL">
  <strong><xsl:value-of select="."/></strong>
</xsl:template>

Caution

Remember that this rule selects SYMBOL elements that are children of ATOM elements, not ATOM elements that have SYMBOL children. In other words, the . in <xsl:value-of select="."/> refers to the SYMBOL and not to the ATOM.

You can specify deeper matches by stringing patterns together. For example, PERIODIC_TABLE/ATOM/NAME selects NAME elements whose parent is an ATOM element whose parent is a PERIODIC_TABLE element.

You can also use the * wild card to substitute for an arbitrary element name in a hierarchy. For example, this template rule applies to all SYMBOL elements that are grandchildren of a PERIODIC_TABLE element.

<xsl:template match="PERIODIC_TABLE/*/SYMBOL">
  <strong><xsl:value-of select="."/></strong>
</xsl:template>

Finally, as you saw above, a / by itself selects the root node of the document. For example, this rule applies to all PERIODIC_TABLE elements that are root elements of the document:

<xsl:template match="/PERIODIC_TABLE">
  <html><xsl:apply-templates/></html>
</xsl:template>

While / refers to the root node, /* refers to the root element, whatever it is. For example, this template doesn't care whether the root element is PERIODIC_TABLE, DOCUMENT, or SCHENECTADY. It produces the same output in all cases.

<xsl:template match="/*">
  <html>
    <head>
      <title>Atomic Number vs. Atomic Weight</title>
    </head>
    <body>
      <xsl:apply-templates/>
    </body>
  </html>
</xsl:template>

Matching descendants with //

Sometimes, especially with an uneven hierarchy, you may find it easier to bypass intermediate nodes and simply select all the elements of a given type, whether they're immediate children, grandchildren, great-grandchildren, or what have you. The double slash, //, refers to a descendant element at an arbitrary level. For example, this template rule applies to all NAME descendants of PERIODIC_TABLE, no matter how deep:

<xsl:template match="PERIODIC_TABLE//NAME">
  <i><xsl:value-of select="."/></i>
</xsl:template>

The periodic table example is fairly shallow, but this trick becomes more important in deeper hierarchies, especially when an element can contain other elements of its type (for example, an ATOM contains an ATOM).

The // operator at the beginning of a pattern selects any descendant of the root node. For example, this template rule processes all ATOMIC_NUMBER elements while completely ignoring their location:

<xsl:template match="//ATOMIC_NUMBER">
  <i><xsl:value-of select="."/></i>
</xsl:template>

Matching by ID

You may want to apply a particular style to a particular single element without changing all other elements of that type. The simplest way to do that in XSLT is to attach a style to the element's ID type attribute. This is done with the id() selector, which contains the ID value in single quotes. For example, this rule makes the element with the ID e47 bold:

<xsl:template match="id('e47')">
  <b><xsl:value-of select="."/></b>
</xsl:template>

This assumes, of course, that the elements that you want to select in this fashion have an attribute declared as type ID in the source document's DTD. This may not be the case, however. For one thing, many documents do not have DTDs. They're merely well-formed, not valid. And even if they have a DTD, there's no guarantee that any element has an ID type attribute.

Cross-Reference

ID-type attributes are not simply attributes with the name ID. ID type attributes are discussed in Chapter 11.

Matching attributes with @

As you saw in Chapter 5, the @ sign matches against attributes and selects nodes according to attribute names. Simply prefix the name of the attribute that you want to select with the @ sign. For example, this template rule matches UNITS attributes, and wraps them in an I element.

    <xsl:template match="@UNITS">
      <I><xsl:value-of select="."/></I>
    </xsl:template>

However, merely adding this rule to the style sheet will not automatically produce italicized units in the output because attributes are not children of the elements that contain them. Therefore by default when an XSLT processor is walking the tree it does not see attribute nodes. You have to explicitly process them using xsl:apply-templates with an appropriate select attribute. Listing 17-8 demonstrates with a style sheet that outputs a table of atomic numbers versus melting points. Not only is the value of the MELTING_POINT element written out, so is the value of its UNITS attribute. This is selected by <xsl:apply-templates select="@UNITS"/> in the template rule for MELTING_POINT elements.

Listing 17-8: An XSLT style sheet that selects the UNITS attribute with @

<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:template match="/PERIODIC_TABLE">
      <html>
        <body>
          <h1>Atomic Number vs. Melting Point</h1>
          <table>
            <th>Element</th>
            <th>Atomic Number</th>
            <th>Melting Point</th>
            <xsl:apply-templates/>
          </table>
        </body>
      </html>
    </xsl:template>
    <xsl:template match="ATOM">
       <tr>
        <td><xsl:value-of select="NAME"/></td>
        <td><xsl:value-of select="ATOMIC_NUMBER"/></td>
        <td><xsl:apply-templates select="MELTING_POINT"/></td>
      </tr>
    </xsl:template>
    <xsl:template match="MELTING_POINT">
      <xsl:value-of select="."/>
      <xsl:apply-templates select="@UNITS"/>
    </xsl:template>
    <xsl:template match="@UNITS">
      <I><xsl:value-of select="."/></I>
    </xsl:template>
</xsl:stylesheet>

Recall that the value of an attribute node is simply the normalized string value of the attribute. Once you apply the style sheet in Listing 17-8, ATOM elements come out formatted like this:

  <tr>
<td>Hydrogen</td><td>1</td><td>13.81<I>Kelvin</I></td>
</tr>
  <tr>
<td>Helium</td><td>2</td><td>0.95<I>Kelvin</I></td>
</tr>

You can combine attributes with elements using the various hierarchy operators. For example, the pattern BOILING_POINT/@UNITS refers to the UNITS attribute of a BOILING_POINT element. ATOM/*/@UNITS matches any UNITS attribute of a child element of an ATOM element. This is especially helpful when matching against attributes in template rules. You must remember that what's being matched is the attribute node, not the element that contains it. It's a very common mistake to implicitly confuse the attribute node with the element node that contains it. For example, consider this rule, which attempts to apply templates to all child elements that have UNITS attributes:

<xsl:template match="ATOM">
  <xsl:apply-templates select="@UNITS"/>
</xsl:template>

What it actually does is apply templates to the nonexistent UNITS attributes of ATOM elements.

You can also use the @* wild card to match all attributes of an element, for example BOILING_POINT/@* to match all attributes of BOILING_POINT elements. You can also add a namespace prefix after the @ to match all attributes in a declared namespace. For instance, @xlink:* matches all the XLink attributes, such as xlink:show, xlink:type, and xlink:href, assuming the xlink prefix is mapped to the http://www.w3.org/1999/xlink XLink namespace URI.

Matching comments with comment()

Most of the time you should simply ignore comments in XML documents. Making comments an essential part of a document is a very bad idea. Nonetheless, XSLT does provide a means to match a comment if you absolutely have to.

To match a comment, use the comment() pattern. Although this pattern has function-like parentheses, it never actually takes any arguments. For example, this template rule italicizes all comments:

    <xsl:template match="comment()">
      <i><xsl:value-of select="."/></i>
    </xsl:template>

To distinguish between different comments, you have to look at the comments' parent and ancestors. For example, recall that a DENSITY element looks like this:

    <DENSITY UNITS="grams/cubic centimeter">
      <!-- At 300K, 1 atm -->
      0.0000899
    </DENSITY>

You can use the hierarchy operators to select particular comments. For example, this rule only matches comments that occur inside DENSITY elements:

<xsl:template match="DENSITY/comment()">
  <i><xsl:value-of select="."/></i>
</xsl:template>

The only reason Listing 17-1 uses a comment to specify conditions instead of an attribute or element is precisely for this example. In practice, you should never put important information in comments. The real reason XSLT allows you to select comments is so that a style sheet can transform from one XML application to another while leaving the comments intact. Any other use indicates a poorly designed original document. The following rule matches all comments, and copies them back out again using the xsl:comment element.

<xsl:template match="comment()">
  <xsl:comment><xsl:value-of select="."/></xsl:comment>
</xsl:template>

Matching processing instructions with processing-instruction()

When it comes to writing structured, intelligible, maintainable XML, processing instructions aren't much better than comments. However, there are occasional genuine needs for them, including attaching style sheets to documents.

The processing-instruction() function matches processing instructions. The argument to processing-instruction() is a quoted string giving the target of the processing instruction to select. If you do not include an argument, the first processing instruction child of the current node is matched. For example, this rule matches the processing instruction children of the root node (most likely the xml-stylesheet processing instruction). The xsl:processing-instruction element inserts a processing instruction with the specified name and value in the output document.

<xsl:template match="/processing-instruction()">
  <xsl:processing-instruction name="xml-stylesheet">
    type="text/xml" value="auto.xsl"
  </xsl:processing-instruction>
</xsl:template>

This rule also matches the xml-stylesheet processing instruction, but by its name:

<xsl:template
  match="processing-instruction('xml-stylesheet')">
  <xsl:processing-instruction name="xml-stylesheet">
    <xsl:value-of select="."/>
  </xsl:processing-instruction>
</xsl:template>

In fact, one of the primary reasons for distinguishing between the root element and the root node is so that processing instructions from the prolog can be read and processed. Although the xml-stylesheet processing instruction uses a name = value syntax, XSL does not consider these to be attributes because processing instructions are not elements. The value of a processing instruction is simply everything between the white space following its name and the closing ?>.

Matching text nodes with text()

Text nodes are generally ignored as nodes, although their values are included as part of the value of a selected element. However, the text() operator does enable you to specifically select the text child of an element. Despite the parentheses, this operator takes no arguments. For example, this rule emboldens all text:

<xsl:template match="text()">
  <b><xsl:value-of select="."/></b>
</xsl:template>

The main reason this operator exists is for the default rules. XSLT processors must provide the following default rule whether the author specifies it or not:

<xsl:template match="text()">
  <xsl:value-of select="."/>
</xsl:template>

This means that whenever a template is applied to a text node, the text of the node is output. If you do not want the default behavior, you can override it. For example, including the following empty template rule in your style sheet will prevent text nodes from being output unless specifically matched by another rule.

<xsl:template match="text()">
</xsl:template>

Using the or operator |

The vertical bar (|)allows a template rule to match multiple patterns. If a node matches one pattern or the other, it will activate the template. For example, this template rule matches both ATOMIC_NUMBER and ATOMIC_WEIGHT elements:

<xsl:template match="ATOMIC_NUMBER|ATOMIC_WEIGHT">
  <B><xsl:apply-templates/></B>
</xsl:template>

You can include white space around the | if that makes the code clearer. For example,

<xsl:template match="ATOMIC_NUMBER | ATOMIC_WEIGHT">
  <B><xsl:apply-templates/></B>
</xsl:template>

You can also use more than two patterns in sequence. For example, this template rule applies to ATOMIC_NUMBER, ATOMIC_WEIGHT, and SYMBOL elements (that is, it matches ATOMIC_NUMBER, ATOMIC_WEIGHT and SYMBOL elements):

<xsl:template match="ATOMIC_NUMBER | ATOMIC_WEIGHT | SYMBOL">
  <B><xsl:apply-templates/></B>
</xsl:template>

The / operator is evaluated before the | operator. Thus, the following template rule matches an ATOMIC_NUMBER child of an ATOM, or an ATOMIC_WEIGHT of unspecified parentage, not an ATOMIC_NUMBER child of an ATOM or an ATOMIC_WEIGHT child of an ATOM.

<xsl:template match="ATOM/ATOMIC_NUMBER|ATOMIC_WEIGHT">
  <B><xsl:apply-templates/></B>
</xsl:template>

Testing with [ ]

So far, I've merely tested for the presence of various nodes. However, you can test for more details about the nodes that match a pattern using []. You can perform many different tests including:

For example, seaborgium, element 106, has only been created in microscopic quantities. Even its most long-lived isotope has a half-life of only 30 seconds. With such a hard-to-create, short-lived element, it's virtually impossible to measure the density, melting point, and other bulk properties. Consequently, the periodic table document omits the elements describing the bulk properties of seaborgium and similar atoms because the data simply doesn’t exist. If you want to create a table of atomic number versus melting point, you should omit those elements with unknown melting points. To do this, you can provide one template for ATOM elements that have MELTING_POINT children and another one for elements that don't, like this:

<!-- Include nothing for arbitrary atoms -->
<xsl:template match="ATOM" />
<!-- Include a table row for atoms that do have 
     melting points. This rule will override the 
     previous one for those atoms that do have 
     melting points. -->
<xsl:template match="ATOM[MELTING_POINT]">
  <tr>
    <td><xsl:value-of select="NAME"/></td>
    <td><xsl:value-of select="MELTING_POINT"/></td>
  </tr>
</xsl:template>

Note here, that it is the ATOM element being matched, not the MELTING_POINT element as in the case of ATOM/MELTING_POINT.

The test brackets can contain more than simply a child-element name. In fact, they can contain any XPath expression. (XPath expressions are a superset of match patterns that are discussed in the next section.) If the specified element has a child matching that expression, it is considered to match the total pattern. For example, this template rule matches ATOM elements with NAME or SYMBOL children.

<xsl:template match="ATOM[NAME | SYMBOL]">
</xsl:template>

This template rule matches ATOM elements with a DENSITY child element that has a UNITS attribute:

<xsl:template match="ATOM[DENSITY/@UNITS]">
</xsl:template>

To revisit an earlier example, to correctly find all child elements that have UNITS attributes, use * to find all elements and [@UNITS] to winnow those down to the ones with UNITS attributes, like this:

<xsl:template match="ATOM">
  <xsl:apply-templates select="*[@UNITS]"/>
</xsl:template>

One type of pattern testing that proves especially useful is string equality. An equals sign (=) can test whether the value of a node identically matches a given string. For example, this template finds the ATOM element that contains an ATOMIC_NUMBER element whose content is the string 10 (Neon).

<xsl:template match="ATOM[ATOMIC_NUMBER='10']">
  This is Neon!
</xsl:template>

Testing against element content may seem extremely tricky because of the need to get the value exactly right, including white space. You may find it easier to test against attribute values since those are less likely to contain insignificant white space. For example, the style sheet in Listing 17-9 applies templates only to those ATOM elements whose STATE attribute value is the three letters GAS.

Listing 17-9: An XSLT style sheet that selects only those ATOM elements whose STATE attribute has the value GAS

<?xml version="1.0"?>
<xsl:stylesheet version="1.0" 
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:template match="PERIODIC_TABLE">
    <html>
      <head><title>Gases</title></head>
      <body>
        <xsl:apply-templates/>
      </body>
    </html>
  </xsl:template>
  <xsl:template match="ATOM"/>
  <xsl:template match="ATOM[@STATE='GAS']">
    <P><xsl:value-of select="."/></P>
  </xsl:template>
</xsl:stylesheet>

You can use other XPath expressions for more complex matches. For example, you can select all elements whose names begin with "A" or all elements with an atomic number less than 100.

XPath Expressions for Selecting Nodes

The select attribute is used in xsl:apply-templates, xsl:value-of, xsl:for-each, xsl:copy-of, xsl:variable, xsl:param, and xsl:sort to specify exactly which nodes are operated on. The value of this attribute is an expression written in the XPath language. The XPath language provides a means of identifying a particular element, group of elements, text fragment, or other part of an XML document. The XPath syntax is used both for XSLT and XPointer.

Cross-reference

XPointers are discussed in Chapter 20. XPath is discussed further in that chapter as well.

Expressions are a superset of the match patterns discussed in the last section. That is, all match patterns are expressions, but not all expressions are match patterns. Recall that match patterns enable you to match nodes by element name, child elements, descendants, and attributes, as well as by making simple tests on these items. XPath expressions allow you to select nodes through all these criteria but also by referring to ancestor nodes, parent nodes, sibling nodes, preceding nodes, and following nodes. Furthermore, expressions aren't limited to producing merely a list of nodes, but can also produce booleans, numbers, and strings.

Node axes

Expressions are not limited to specifying the children and descendants of the current node. XPath provides a number of axes that you can use to select from different parts of the tree relative to some particular node in the tree called the context node. In XSLT, the context node is normally initialized to the current node that the template matches, though there are ways to change this. Table 17-2 summarizes the axes and their meanings.

Table 17-2: Expression Axes

Axis:

Selects From:

ancestor

The parent of the context node, the parent of the parent of the context node, the parent of the parent of the parent of the context node, and so forth back to the root node

ancestor-or-self

The ancestors of the context node and the context node itself

attribute

The attributes of the context node

child

The immediate children of the context node

descendant

The children of the context node, the children of the children of the context node, and so forth

descendant-or-self

The context node itself and its descendants

following

All nodes that start after the end of the context node, excluding attribute and namespace nodes

following-sibling

All nodes that start after the end of the context node and have the same parent as the context node

namespace

The namespace of the context node

parent

The unique parent node of the context node

preceding

All nodes that finish before the beginning of the context node, excluding attribute and namespace nodes

preceding-sibling

All nodes that start before the beginning of the context node and have the same parent as the context node

self

The context node

Choosing an axis limits the expression so that it only selects from the set of nodes indicated in the second column of Table 17-2. The axis is generally followed by a double colon (::) and a node test that further winnows down this node set. For example, a node test may contain the name of the element to be selected as in the following template rule:

<xsl:template match="ATOM">
  <tr>
    <td>
      <xsl:value-of select="child::NAME"/>
    </td>
    <td>
      <xsl:value-of select="child::ATOMIC_NUMBER"/>
    </td>
    <td>
      <xsl:value-of select="child::ATOMIC_WEIGHT"/>
    </td>
  </tr>
</xsl:template>

The template rule matches ATOM elements. When an ATOM element is matched, that element becomes the context node. A NAME element, an ATOMIC_NUMBER element, and an ATOMIC_WEIGHT element are all selected from the children of that matched ATOM element and output as table cells. (If there's one more than one of these desired elements — for example, three NAME elements — then all are selected but only the value of the first one is taken.)

The child axis doesn't let you do anything that you can't do with element names alone. In fact select="ATOMIC_WEIGHT" is just an abbreviated form of select="child::ATOMIC_WEIGHT". However, the other axes are a little more interesting.

Referring to the parent element is illegal in match patterns, but not in expressions. To refer to the parent, you use the parent axis. For example, this template matches BOILING_POINT elements but outputs the value of the parent ATOM element:

<xsl:template match="BOILING_POINT">
  <P><xsl:value-of select="parent::ATOM"/></P>
</xsl:template>

Some radioactive atoms such as polonium have half-lives so short that bulk properties such as the boiling point and melting point can't be measured. Therefore, not all ATOM elements necessarily have BOILING_POINT child elements. The above rule enables you to write a template that only outputs those elements that actually have boiling points. Expanding on this example, Listing 17-10 matches the MELTING_POINT elements but actually outputs the parent ATOM element using parent::ATOM.

Listing 17-10: A style sheet that outputs only those elements with known melting points

<?xml version="1.0"?>
<xsl:stylesheet version="1.0" 
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:template match="/">
      <html>
        <body>
          <xsl:apply-templates select="PERIODIC_TABLE"/>
        </body>
      </html>
    </xsl:template>
    <xsl:template match="PERIODIC_TABLE">
      <h1>Elements with known Melting Points</h1>
      <xsl:apply-templates select=".//MELTING_POINT"/>
    </xsl:template>
    <xsl:template match="MELTING_POINT">
      <p>
        <xsl:value-of select="parent::ATOM"/>
      </p>
    </xsl:template>
</xsl:stylesheet>

Once in awhile, you may need to select the nearest ancestor of an element with a given type. The ancestor axis does this. For example, this rule inserts the value of the nearest PERIODIC_TABLE element that contains the matched SYMBOL element.

<xsl:template match="SYMBOL">
  <xsl:value-of select="ancestor::PERIODIC_TABLE"/>
</xsl:template>

The ancestor-or-self axis behaves like the ancestor axis except that if the context node passes the node test, then it will be returned as well. For example, this rule matches all elements. If the matched element is a PERIODIC_TABLE, then that very PERIODIC_TABLE is selected in xsl:value-of.

<xsl:template match="*">
  <xsl:value-of select="ancestor-or-self::PERIODIC_TABLE"/>
</xsl:template>

Node tests

Instead of the name of a node, the axis may be followed by one of these four node-type functions:

The comment() function selects a comment node. The text() function selects a text node. The processing-instruction() function selects a processing instruction node, and the node() function selects any type of node. (The * wild card only selects element nodes.) The processing-instruction() node type can also contain an optional argument specifying the name of the processing instruction to select.

Hierarchy operators

You can use the / and // operators to string expressions together. For example, Listing 17-11 prints a table of element names, atomic numbers, and melting points for only those elements that have melting points. It does this by selecting the parent of the MELTING_POINT element, then finding that parent's NAME and ATOMIC_NUMBER children with select="parent::*/child::NAME)".

Listing 17-11: A table of melting point versus atomic number

<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:template match="/PERIODIC_TABLE">
      <html>
        <body>
          <h1>Atomic Number vs. Melting Point</h1>
          <table>
            <th>Element</th>
            <th>Atomic Number</th>
            <th>Melting Point</th>
            <xsl:apply-templates select="child::ATOM"/>
          </table>
        </body>
      </html>
    </xsl:template>
    <xsl:template match="ATOM">
      <xsl:apply-templates
       select="child::MELTING_POINT"/>
    </xsl:template>
    <xsl:template match="MELTING_POINT">
       <tr>
        <td>
          <xsl:value-of select="parent::*/child::NAME"/>
        </td>
        <td>
          <xsl:value-of
         select="parent::*/child::ATOMIC_NUMBER"/>
        </td>
        <td>
          <xsl:value-of select="self::*"/>
          <xsl:value-of select="attribute::UNITS"/>
        </td>
      </tr>
   </xsl:template>
</xsl:stylesheet>

This is not the only way to solve the problem. Another possibility is to use the preceding-sibling and following-sibling axes, or both if the relative location (preceding or following) is uncertain. The necessary template rule for the MELTING_POINT element looks like this:

 <xsl:template match="MELTING_POINT">
    <tr>
     <td>
       <xsl:value-of
        select="preceding-sibling::NAME
              | following-sibling::NAME"/>
     </td>
     <td>
       <xsl:value-of
        select="preceding-sibling::ATOMIC_NUMBER
              | following-sibling::ATOMIC_NUMBER"/>
     </td>
     <td>
       <xsl:value-of select="self::*"/>
       <xsl:value-of select="attribute::UNITS"/>
     </td>
   </tr>
</xsl:template>

Abbreviated syntax

The various axes in Table 17-2 are a bit too wordy for comfortable typing. XPath also defines an abbreviated syntax that can substitute for the most common of these axes and is more used in practice. Table 17-3 shows the full and abbreviated equivalents.

Table 17-3: Abbreviated Syntax for XPath Expressions

Abbreviation:

Full:

.

self::node()

..

parent::node()

name

child::name

@name

attribute::name

//

/descendant-or-self::node()/

Listing 17-12 demonstrates by rewriting Listing 17-11 using the abbreviated syntax. The output produced by the two style sheets is exactly the same, however.

Listing 17-12: A table of melting point versus atomic number using the abbreviated syntax

<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:template match="/PERIODIC_TABLE">
      <html>
        <body>
          <h1>Atomic Number vs. Melting Point</h1>
          <table>
            <th>Element</th>
            <th>Atomic Number</th>
            <th>Melting Point</th>
            <xsl:apply-templates select="ATOM"/>
          </table>
        </body>
      </html>
    </xsl:template>
    <xsl:template match="ATOM">
      <xsl:apply-templates
       select="MELTING_POINT"/>
    </xsl:template>
    <xsl:template match="MELTING_POINT">
       <tr>
        <td>
          <xsl:value-of
           select="../NAME"/>
        </td>
        <td>
          <xsl:value-of
         select="../ATOMIC_NUMBER"/>
        </td>
        <td>
          <xsl:value-of select="."/>
          <xsl:value-of select="@UNITS"/>
        </td>
      </tr>
   </xsl:template>
</xsl:stylesheet>

Match patterns can only use the abbreviated syntax and the child and attribute axes. The full syntax using the axes of Table 17-2 is restricted to expressions.

Expression types

Every expression evaluates to a single value. For example, the expression 3 + 2 evaluates to the value 5. The expressions used so far have all evaluated to node sets. However, there are five types of expressions in XSLT:

Node sets

A node set is an unordered group of nodes from the input document. The axes in Table 17-2 all return a node set containing the nodes they match. Which nodes are in the node set depends on the context node, the node test, and the axis.

For example, when the context node is the PERIODIC_TABLE element of Listing 17-1, the XPath expression select="child::ATOM" returns a node set that contains both ATOM elements in that document. The XPath expression select="child::ATOM/child::NAME" returns a node set containing the two element nodes <NAME>Hydrogen</NAME> and <NAME>Helium</NAME> when the context node is the PERIODIC_TABLE element of Listing 17-1.

The context node is a member of the context node list. The context node list is that group of elements that all match the same rule at the same time, generally as a result of one xsl:apply-templates or xsl:for-each call. For instance, when Listing 17-12 is applied to Listing 17-1, the ATOM template is invoked twice, first for the hydrogen atom, then for the helium atom. The first time it's invoked, the context node is the hydrogen ATOM element. The second time it's invoked, the context node is the helium ATOM element. However, both times the context node list is the set containing both the helium and hydrogen ATOM elements.

Table 17-4 lists a number of functions that operate on node sets, either as arguments or as the context node.

Table 17-4: Functions That Operate on or Return Node Sets

Function:

Return Type:

Returns:

position()

number

The position of the context node in the context node list; the first node in the list has position 1

last()

number

The number of nodes in the context node list; this is the same as the position of the last node in the list

count(node-set)

number

The number of nodes in node-set.

id(string1 string2 string3…)

node set

A node set containing all the elements anywhere in the same document that have an ID named in the argument list; the empty set if no element has the specified ID.

key(string name, Object value)

node set

A node set containing all nodes in this document that have a key with the specified value. Keys are set with the top-level xsl:key element.

document(string URI, string base)

node set

A node set in the document referred to by the URI; the nodes are chosen from the named anchor or XPointer used by the URI. If there is no named anchor or XPointer, then the root element of the named document is the node set. Relative URIs are relative to the base URI given in the second argument. If the second argument is omitted, then relative URIs are relative to the URI of the style sheet (not the source document!).

local-name(node set)

string

The local name (everything after the namespace prefix) of the first node in the node set argument; can be used without any arguments to get the local name of the context node.

namespace-uri(node set)

string

The URI of the namespace of the first node in the node set; can be used without any arguments to get the URI of the namespace of the context node; returns an empty string if the node is not in a namespace.

name(node set)

string

The qualified name (both prefix and local part) of the first node in the node set argument; can be used without an argument to get the qualified name of the context node.

generate-id(node set)

string

A unique identifier for the first node in the argument node set; can be used without any argument to generate an ID for the context node.

If an argument of the wrong type is passed to one of these functions, then XSLT will attempt to convert that argument to the correct type; for instance, by converting the number 12 to the string "12". However, no arguments may be converted to node sets.

The position() function can be used to determine an element's position within a node set. Listing 17-13 is a style sheet that prefixes the name of each atom's name with its position in the document using <xsl:value-of select="position()"/>.

Listing 17-13: A style sheet that numbers the atoms in the order they appear in the document

<?xml version="1.0"?>
<xsl:stylesheet version="1.0" 
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:template match="/PERIODIC_TABLE">
    <HTML>
      <HEAD><TITLE>The Elements</TITLE></HEAD>
      <BODY>
        <xsl:apply-templates select="ATOM"/>
      </BODY>
    </HTML>
  </xsl:template>
  <xsl:template match="ATOM">
    <P>
      <xsl:value-of select="position()"/>.
      <xsl:value-of select="NAME"/>
    </P>
  </xsl:template>
</xsl:stylesheet>

When this style sheet is applied to Listing 17-1, the output is this:

<HTML>
<HEAD>
<TITLE>The Elements</TITLE>
</HEAD>
<BODY>
<P>1.
      Hydrogen</P>
<P>2.
      Helium</P>
</BODY>
</HTML>

Booleans

A boolean has one of two values: true or false. XSLT allows any kind of data to be transformed into a boolean. This is often done implicitly when a string or a number or a node set is used where a boolean is expected, as in the test attribute of an xsl:if element. These conversions can also be performed by the boolean() function which converts an argument of any type to a boolean according to these rules:

Booleans are also produced as the result of expressions involving these operators:

Caution

The < sign is illegal in attribute values. Consequently, it must be replaced by &lt; even when used as the less-than operator.

These operators are most commonly used in predicate tests to determine whether a rule should be invoked. An XPath expression can contain not only a pattern that selects certain nodes, but also a predicate that further filters the set of nodes selected. For example, child::ATOM selects all the ATOM children of the context node. However, child::ATOM[position()=1] selects only the first ATOM child of the context node. [position()=1] is a predicate on the node test ATOM that returns a boolean result: True if the position of the ATOM is equal to one; false otherwise. Each node test can have any number of predicates. However, more than one is unusual.

For example, this template rule applies to the first ATOM element in the periodic table, but not to subsequent ones, by testing whether or not the position of the element equals 1.

<xsl:template match="PERIODIC_TABLE/ATOM[position()=1]">
  <xsl:value-of select="."/>
</xsl:template>

This template rule applies to all ATOM elements that are not the first child element of the PERIODIC_TABLE by testing whether the position is greater than 1:

<xsl:template match="PERIODIC_TABLE/ATOM[position()>1]">
  <xsl:value-of select="."/>
</xsl:template>

The keywords and and or logically combine two boolean expressions according to the normal rules of logic. For example, suppose you want a template that matches an ATOMIC_NUMBER element that is both the first and last child of its parent element; that is, it is the only element of its parent. This template rule uses and to accomplish that:

<xsl:template
 match="ATOMIC_NUMBER[position()=1 and position()=last()]">
  <xsl:value-of select="."/>
</xsl:template>

If the first condition is false, then the complete and expression is guaranteed to be false. Consequently, the second condition won't be checked.

This template matches both the first and last ATOM elements in their parent by matching when the position is 1 or when the position is equal to the number of elements in the set:

<xsl:template match="ATOM[position()=1 or position()=last()]">
  <xsl:value-of select="."/>
</xsl:template>

This is logical or, so it will also match if both conditions are true. That is, it will match an ATOM that is both the first and last child of its parent. If the first condition is true, then the complete or expression is guaranteed to be true. Consequently, the second condition won't be checked.

The not() function reverses the result of an operation. For example, this template rule matches all ATOM elements that are not the first child of their parents:

<xsl:template match="ATOM[not(position()=1)]">
  <xsl:value-of select="."/>
</xsl:template>

The same template rule could be written using the not equal operator != instead:

<xsl:template match="ATOM[position()!=1]">
  <xsl:value-of select="."/>
</xsl:template>

This template rule matches all ATOM elements that are neither the first nor last ATOM child of their parent:

<xsl:template match =
 "ATOM[not(position()=1 or position()=last())]">
  <xsl:value-of select="."/>
</xsl:template>

XSLT does not have an exclusive or operator. However, one can be formed by judicious use of not(), and, and or. For example, this rule selects those ATOM elements that are either the first or last child, but not both:

<xsl:template
 match="ATOM[(position()=1 or position()=last())
              and not(position()=1 and position()=last())]">
  <xsl:value-of select="."/>
</xsl:template>

There are three remaining functions that return booleans:

Numbers

XPath numbers are 64-bit IEEE 754 floating-point doubles. Even numbers like 42 or -7000 that look like integers are stored as doubles. Nonnumber values such as strings and booleans are converted to numbers automatically as necessary, or at user request through the number() function using these rules:

For example, this template only outputs the nonnaturally occurring transuranium elements; that is, those elements with atomic numbers greater than 92 (the atomic number of uranium). The node set produced by ATOMIC_NUMBER is implicitly converted to the string value of the current ATOMIC_NUMBER node. This string is then converted into a number.

<xsl:template match="/PERIODIC_TABLE">
  <HTML>
    <HEAD><TITLE>The Transuranium Elements</TITLE></HEAD>
    <BODY>
      <xsl:apply-templates select="ATOM[ATOMIC_NUMBER>92]"/>
    </BODY>
  </HTML>
</xsl:template>

XPath provides the standard four arithmetic operators:

For example, <xsl:value-of select="2+2"/> inserts the string "4" into the output document. These operations are more commonly used as part of a test. For example, this rule selects those elements whose atomic weight is more than twice their atomic number:

<xsl:template match="/PERIODIC_TABLE">
  <HTML>
    <BODY>
      <H1>High Atomic Weight to Atomic Number Ratios</H1>
      <xsl:apply-templates
        select="ATOM[ATOMIC_WEIGHT > 2 * ATOMIC_NUMBER]"/>
    </BODY>
  </HTML>
</xsl:template>

This template actually prints the ratio of atomic weight to atomic number:

<xsl:template match="ATOM">
  <p>
    <xsl:value-of select="NAME"/>
    <xsl:value-of select="ATOMIC_WEIGHT div ATOMIC_NUMBER"/>
  </p>
</xsl:template>

XPath also provides the less-familiar mod binary operator, which takes the remainder of two numbers. When used in conjunction with position() this operator lets you perform tasks such as outputting every second ATOM or alternating colors between rows in a table. Just define templates that apply different styles when the position mod two is one and when it's zero. For example, these two rules use different colors for alternate rows of a table:

   <xsl:template match="ATOM[position() mod 2 = 1]">
       <tr>
        <td><xsl:value-of select="NAME"/></td>
        <td><xsl:value-of select="ATOMIC_NUMBER"/></td>
        <td><xsl:apply-templates select="MELTING_POINT"/></td>
      </tr>
   </xsl:template>
   <xsl:template match="ATOM[position() mod 2 = 0]">
       <tr style="color: #666666">
        <td><xsl:value-of select="NAME"/></td>
        <td><xsl:value-of select="ATOMIC_NUMBER"/></td>
        <td><xsl:apply-templates select="MELTING_POINT"/></td>
      </tr>
   </xsl:template>

You can change the divisor to 3 to apply different styles to every third element, to 4 to apply different styles to every fourth element, and so forth.

Finally, XPath includes four functions that operate on numbers:

For example, this template rule estimates the number of neutrons in an atom by subtracting the atomic number (the number of protons) from the atomic weight (the weighted average over the natural distribution of isotopes of the number of neutrons plus the number of protons) and rounding to the nearest integer:

<xsl:template match="ATOM">
  <p>
    <xsl:value-of select="NAME"/>
    <xsl:value-of
     select="round(ATOMIC_WEIGHT - ATOMIC_NUMBER)"/>
  </p>
</xsl:template>

This rule calculates the average atomic weight of all the atoms in the table by adding all the atomic weights, and then dividing by the number of atoms:

<xsl:template match="/PERIODIC_TABLE">
  <HTML>
    <BODY>
    <H1>Average Atomic Weight</H1>
      <xsl:value-of
       select="sum(descendant::ATOMIC_WEIGHT)
               div count(descendant::ATOMIC_WEIGHT)"/>
    </BODY>
  </HTML>
</xsl:template>

Strings

A string is a sequence of Unicode characters. Other data types can be converted to strings using the string() function according to these rules:

Besides string(), XSLT contains 10 functions that manipulate strings. These are summarized in Table 17-5.

Table 17-5: XPath String Functions

Function:

Return Type:

Returns:

starts-with(main_string, prefix_string)

Boolean

True if main_string starts with prefix_string; false otherwise

contains(containing_string, contained_string)

Boolean

True if the contained_string is part of the containing_string; false otherwise

substring(string, offset, length)

String

length characters from the specified offset in string; or all characters from the offset to the end of the string if length is omitted; length and offset are rounded to the nearest integer if necessary

substring-before(string, marker-string)

String

The part of the string from the first character up to (but not including) the first occurrence of marker-string

substring-after(string, marker-string)

String

The part of the string from the end of the first occurrence of marker-string to the end of string; the first character in the string is at offset 1

string-length(string)

Number

The number of characters in string

normalize-space(string)

String

The string after leading and trailing white space is stripped and runs of white space are replaced with a single space; if the argument is omitted the string value of the context node is normalized

translate(string, replaced_text, replacement_text)

String

Returns string with occurrences of characters in replaced_text replaced by the corresponding characters from replacement_text

concat(string1, string2, . . . )

String

Returns the concatenation of as many strings as are passed as arguments in the order they were passed

format-number(number, format-string, locale-string)

String

Returns the string form of number formatted according to the specified format-string as if by Java 1.1's java.text.DecimalFormat class (see http://java.sun.com/products/jdk/1.1/docs/api/java.text.DecimalFormat.html); the locale-string is an optional argument that provides the name of the xsl:decimal-format element used to interpret the format-string

Result tree fragments

A result tree fragment is a portion of an XML document that is not a complete node or set of nodes. For instance, using the document() function with a URI that points into the middle of an element might produce a result tree fragment. Result tree fragments may also be returned by some extension functions (functions unique to a particular XSLT implementation or installation).

Because result tree fragments aren't well-formed XML, you can't do much with them. In fact, the only allowed operations are to convert them to a string or a boolean using string() and boolean(), respectively.

The Default Template Rules

Having to carefully map the hierarchy of an XML document in an XSLT style sheet may be inconvenient. This is especially true if the document does not follow a stable, predictable order like the periodic table, but rather throws elements together willy-nilly like many Web pages. In those cases, you should have general rules that can find an element and apply templates to it regardless of where it appears in the source document.

To make this process easier, XSLT defines several default template rules that are implicitly included in all style sheets. The first default rule matches root and element nodes, and applies templates to all child nodes. The second default rule matches text nodes and attributes, copying their values onto the output stream. Together these two rules mean that even a blank XSLT style sheet with just one empty xsl:stylesheet element will still produce the raw character data of the input XML document as output.

The default rule for elements

The first default rule applies to element nodes and the root node:

<xsl:template match="*|/">
  <xsl:apply-templates/>
</xsl:template>

*|/ is XPath shorthand for "any element node or the root node." The purpose of this rule is to ensure that all elements are recursively processed even if they aren't reached by following the explicit rules. That is, unless another rule overrides this one (especially for the root element), all element nodes will be processed.

However, once an explicit rule for any parent of an element is present, this rule will not be activated for the child elements unless the template rule for the parent has an xsl:apply-templates child. For instance, you can stop all processing by matching the root element and neither applying templates nor using xsl:for-each to process the children like this:

<xsl:template match="/">
</xsl:template>

The default rule for text nodes and attributes

Exceptionally observant readers may have noted several of the examples seem to have output the contents of some elements without actually taking the value of the element they were outputting! These contents were provided by XSLT's default rule for text and attribute nodes. This rule is:

<xsl:template match="text()|@*">
  <xsl:value-of select="."/>
</xsl:template>

This rule matches all text and attribute nodes (match="text()|@*") and outputs the value of the node (<xsl:value-of select="."/>). In other words, it copies the text from the input to the output. This rule ensures that at the very least an element's text is output, even if no rule specifically matches it. Another rule can override this one for specific elements where you want either more or less than the text content of an element.

This rule also copies attribute values (but not names). However, they turn from attributes in the input to simple text in the output. Because there's no default rule that ever applies templates to attributes, this rule won't be activated for attributes unless you specifically add a nondefault rule somewhere in the style sheet that does apply templates to attributes of one or more elements.

The default rule for processing instructions and comments

There's also a default rule for processing instructions and comments. It simply says to do nothing; that is, drop the processing instructions and comments from the output as if they didn't exist. It looks like this:

<xsl:template match="processing-instruction()|comment()"/>

You can, of course, replace this with your own rule for handling processing instructions and comments if you want to.

Implications of the default rules

Together, the default rules imply that applying an empty style sheet with only an xsl:stylesheet or xsl:transform element but no children (such as Listing 17-14) to an XML document copies all the #PCDATA out of the elements in the input to the output. However, this method produces no markup. These are, however, extremely low priority rules. Consequently, any other matches take precedence over the default rules.

Listing 17-14: An empty XML style sheet

<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
</xsl:stylesheet>

Caution

One of the most common sources of confusion about XSLT in Internet Explorer 5.5 and earlier is that IE does not provide any of these default rules. You have to make sure that you explicitly match any node whose contents (including descendants) you want to output.

Deciding What Output to Include

It's often necessary to defer decisions about what markup to emit until the input document has been read. For instance, you may want to change the contents of a FILENAME element into the HREF attribute of an A element, or replace one element type in the input with several different element types in the output depending on the value of an attribute. This is accomplished with xsl:element, xsl:attribute, xsl:processing-instruction, xsl:comment, and xsl:text elements. XSLT instructions are used in the contents of these elements and attribute value templates are used in the attribute values of these elements to vary their output.

Attribute value templates

Attribute value templates copy data from the input document to attribute values in the output. For example, suppose you want to convert the periodic table into empty ATOM elements with this attribute-based form:

<ATOM NAME="Vanadium"
  ATOMIC_WEIGHT="50.9415"
  ATOMIC_NUMBER="23"
/>

To do this, you need to extract the contents of elements in the input document and place those in attribute values in the output document. The first thing you're likely to attempt is something similar to this:

<xsl:template match="ATOM">
  <ATOM NAME="<xsl:value-of select='NAME'/>"
    ATOMIC_WEIGHT="<xsl:value-of select='ATOMIC_WEIGHT'/>"
    ATOMIC_NUMBER="<xsl:value-of select='ATOMIC_NUMBER'/>"
  />
</xsl:template>

But this is malformed XML. You can't use the < character inside an attribute value. Furthermore, it's extremely difficult to write software that can parse this in its most general case.

Instead, inside attribute values, data enclosed in curly braces {} takes the place of the xsl:value-of element. The correct way to write the above template is like this:

<xsl:template match="ATOM">
  <ATOM NAME="{NAME}"/>
    ATOMIC_WEIGHT="{ATOMIC_WEIGHT}"
    ATOMIC_NUMBER="{ATOMIC_NUMBER}"
  />
</xsl:template>

In the output, {NAME} is replaced by the value of the NAME child element of the matched ATOM. {ATOMIC_WEIGHT} is replaced by the value of the ATOMIC_WEIGHT child element of the matched ATOM. {ATOMIC_NUMBER} is replaced by the value of the ATOMIC_NUMBER child element, and so on.

Attribute value templates can have more complicated patterns than merely an element name. In fact, you can use any XPath expression in an attribute value template. For example, this template rule selects DENSITY elements in the form used in Listing 17-1.

<xsl:template match="DENSITY">
  <BULK_PROPERTY
    NAME="DENSITY"
    ATOM="{../NAME}"
    VALUE="{normalize-space(.)}"
    UNITS="{@UNITS}"
  />
</xsl:template>

It converts them into BULK_PROPERTY elements that look like this:

<BULK_PROPERTY NAME="DENSITY" ATOM="Helium"
  VALUE="0.0001785" UNITS="grams/cubic centimeter"/>

Attribute values are not limited to a single attribute value template. You can combine an attribute value template with literal data or with other attribute value templates. For example, this template rule matches ATOM elements and replaces them with their name formatted as a link to a file in the format H.html, He.html, and so on. The filename is derived from the attribute value template {SYMBOL}, while the literal data provides the period and extension.

<xsl:template match="ATOM">
  <A HREF="{SYMBOL}.html">
    <xsl:value-of select="NAME"/>
  </A>
</xsl:template>

More than one attribute value template can be included in an attribute value. For example, this template rule includes the density units as part of the VALUE attribute rather than making them a separate attribute:

<xsl:template match="DENSITY">
  <BULK_PROPERTY
    NAME="DENSITY"
    ATOM="{../NAME}"
    VALUE="{normalize-space(.)} {@UNITS}"
  />
</xsl:template>

You can use attribute value templates in many attributes in an XSLT style sheet. This is particularly important in xsl:element, xsl:attribute, and xsl:processing-instruction elements where attribute value templates allow the designer to defer the decision about exactly what element, attribute, or processing instruction appears in the output until the input document is read. You cannot use attribute value templates as the value of a select or match attribute, an xmlns attribute, an attribute that provides the name of another XSLT instruction element, or an attribute of a top-level element (one that's an immediate child of xsl:stylesheet).

Inserting elements into the output with xsl:element

Elements are usually included in the output document simply by including the literal start and end tags in template content. For instance, to insert a P element you merely type <P> and </P> at the appropriate points in the style sheet. However, occasionally you need to use details from the input document to determine which element to place in the output document. This might happen, for example, when making a transformation from a source vocabulary that uses attributes for information to an output vocabulary that uses elements for the same information.

The xsl:element element inserts an element into the output document. The name of the element is given by an attribute value template in the name attribute of xsl:element. The content of the element derives from the content of the xsl:element element, which may include xsl:attribute, xsl:processing-instruction, and xsl:comment instructions (all discussed below) to insert these items.

For example, suppose you want to replace the ATOM elements with GAS, LIQUID, and SOLID elements, depending on the value of the STATE attribute. Using xsl:element, a single rule can do this by converting the value of the STATE attribute to an element name. This is how it works:

<xsl:template match="ATOM">
  <xsl:element name="{@STATE}">
    <NAME><xsl:value-of select="NAME"/></NAME>
    <!-- rules for other children -->
  </xsl:element>
</xsl:template>

By using more complicated attribute value templates, you can perform most of the calculations that you might need.

Inserting attributes into the output with xsl:attribute

You can include attributes in the output document simply by typing the literal attributes themselves. For instance, to insert a DIV element with an ALIGN attribute bearing the value CENTER, you merely type <DIV ALIGN="CENTER"> and </DIV> at the appropriate points in the style sheet. However, you frequently have to rely on data that you read from the input document to determine an attribute value and sometimes even to determine the attribute name.

For example, suppose you want a style sheet that selects atom names and formats them as links to files named H.html, He.html, Li.html, and so forth like this:

<LI><A HREF="H.html">Hydrogen</A></LI>
<LI><A HREF="He.html">Helium</A></LI>
<LI><A HREF="Li.html">Lithium</A></LI>

Each different element in the input will have a different value for the HREF attribute. The xsl:attribute element calculates an attribute name and value and inserts it into the output. Each xsl:attribute element is a child of either an xsl:element element or a literal element. The attribute calculated by xsl:attribute will be attached to the element calculated by its parent in the output. The name of the attribute is specified by the name attribute of the xsl:attribute element. The value of the attribute is given by the contents of the xsl:attribute element. For example, this template rule produces the output shown above:

<xsl:template match="ATOM">
  <LI><A>
    <xsl:attribute name="HREF">
      <xsl:value-of select="SYMBOL"/>.html
    </xsl:attribute>
    <xsl:value-of select="NAME"/>
  </A></LI>
</xsl:template>

All xsl:attribute elements must come before any other content of their parent element. You can't add an attribute to an element after you've already started writing out its content. For example, this template is illegal:

<xsl:template match="ATOM">
  <LI><A>
    <xsl:value-of select="NAME"/>
    <xsl:attribute name="HREF">
      <xsl:value-of select="SYMBOL"/>.html
    </xsl:attribute>
  </A></LI>
</xsl:template>

Defining attribute sets

You often need to apply the same group of attributes to many different elements, of either the same or different classes. For instance, you might want to apply a style attribute to each cell in an HTML table. To make this simpler, you can define one or more attributes as members of an attribute set at the top level of the style sheet with xsl:attribute-set, and then include that attribute set in an element with an xsl:use-attribute-sets attribute.

For example, this xsl:attribute-set element defines an element named cellstyle with a font-family attribute of New York, Times New Roman, Times, serif and a font-size attribute of 12pt.

<xsl:attribute-set name="cellstyle">
  <xsl:attribute name="font-family">
    New York, Times New Roman, Times, serif
  </xsl:attribute>
  <xsl:attribute name="font-size">12pt</xsl:attribute>
</xsl:attribute-set>

This template rule then applies those attributes to td elements in the output.

<xsl:template match="ATOM">
  <tr>
    <td xsl:use-attribute-sets="cellstyle">
      <xsl:value-of select="NAME"/>
    </td>
    <td xsl:use-attribute-sets="cellstyle">
      <xsl:value-of select="ATOMIC_NUMBER"/>
    </td>
  </tr>
</xsl:template>

An element can use more than one attribute set by specifying the names of the all the sets in a white space separated list in the value of the xsl:use-attribute-sets attribute. All attributes from all the sets are applied to the element. For example, this td element possesses attributes from both the cellstyle and the numberstyle attribute sets.

    <td xsl:use-attribute-sets="cellstyle numberstyle">
      <xsl:value-of select="ATOMIC_NUMBER"/>
    </td>

If more than one attribute set defines the same attribute, then the last attribute set mentioned is used. If there is more than one attribute set with the same name (as may happen when one style sheet imports another) then the attributes in the sets are merged. If the identically named attribute sets define the same attribute, then the value from the set with higher importance is chosen. A style sheet in which multiple attribute sets of the same importance with the same name define the same attribute is in error.

You can also include attribute sets in particular elements by adding a use-attribute-sets element to an xsl:element, xsl:copy, or xsl:attribute-set element. For example,

    <xsl:element name="td" use-attribute-sets="cellstyle">
      <xsl:value-of select="ATOMIC_NUMBER"/>
    </xsl:element>

The xsl: prefix is unnecessary (and in fact prohibited) when use-attribute-sets is an attribute of an XSLT element rather than an element from the result set.

Generating processing instructions with xsl:processing-instruction

The xsl:processing-instruction element places a processing instruction in the output document. The target of the processing instruction is specified by a required name attribute. The contents of the xsl:processing-instruction element become the contents of the processing instruction. For example, this rule replaces PROGRAM elements with a gcc processing instruction:

<xsl:template match="PROGRAM">
  <xsl:processing-instruction name="gcc"> -O4
  </xsl:processing-instruction>
</xsl:template>

PROGRAM elements in the input are replaced by this processing instruction in the output:

<?gcc -O4
  ?>

The contents of the xsl:processing-instruction element can include xsl:value-of elements and xsl:apply-templates elements provided the result of these instructions is pure text. For example,

<xsl:template match="PROGRAM">
  <xsl:processing-instruction name="gcc">-O4 
    <xsl:value-of select="NAME"/>
  </xsl:processing-instruction>
</xsl:template>

The xsl:processing-instruction element may not contain xsl:element and other instructions that produce elements and attributes in the result. Furthermore, xsl:processing-instruction may not include any instructions or literal text that insert a ?> in the output because that would prematurely end the processing instruction.

Generating comments with xsl:comment

The xsl:comment element inserts a comment in the output document. It has no attributes. Its contents are the text of the comment. For example,

<xsl:template match="ATOM">
  <xsl:comment>There was an atom here once.</xsl:comment>
</xsl:template>

This rule replaces ATOM nodes with this comment:

<!--There was an atom here once.-->

The contents of the xsl:comment element can include xsl:value-of elements and xsl:apply-templates elements provided the results of these instructions are pure text. It may not contain xsl:element and other instructions that produce elements and attributes in the result. Furthermore, xsl:comment may not include any instructions or literal text that inserts a double hyphen in the comment. This would result in a malformed comment in the output.

Generating text with xsl:text

The xsl:text element inserts its contents into the output document as literal text. For example, this rule replaces each ATOM element with the string "There was an atom here once."

<xsl:template match="ATOM">
  <xsl:text>There was an atom here once.</xsl:text>
</xsl:template>

The xsl:text element isn't much used because most of the time it's easier to simply type the text. However, xsl:text does have a couple of advantages. The first is that it preserves white space exactly, even if the node contains nothing but white space. By default, XSLT processors delete all text nodes from the style sheet that contain only white space. This is useful when dealing with poetry, computer source code, or other text in which white space is significant.

The second advantage is that it enables you to insert unescaped < and & into your output document that are not converted to &lt; and &amp;. To do this, place the general entity reference for the symbol (&lt; or &amp;) in an xsl:text element; then set the xsl:text element's disable-output-escaping attribute to yes. This can be useful when you need to include JavaScript source code in the output document. For example,

<xsl:template match="SCRIPT">
  <script language="javascript">
    <xsl:text disable-output-escaping="yes">
      &lt;!-- if (
         location.host.tolowercase().indexof("ibiblio")
         &lt; 0) {
           location.href="http://www.cafeconleche.org/";
        }
      } // --&gt;
    </xsl:text>
  </script>
</xsl:template>

This may produce output that is not well-formed XML. (Indeed that's the case here.) However, if you're trying to write a non-XML format such as HTML or TeX this may be what you want. Note, however, that the style sheet and the input document are both still well-formed XML.

Copying the Context Node with xsl:copy

The xsl:copy element copies the source node into the output tree. Child elements, attributes, and other content are not automatically copied. However, the contents of the xsl:copy element are an xsl:template element that can select these things to be copied as well. This is often useful when transforming a document from one markup vocabulary to the same or a closely related markup vocabulary. For example, this template rule strips the attributes and child elements off an ATOM and replaces it with the value of its contents enclosed in a b element:

<xsl:template match="ATOM">
   <xsl:copy>
     <b><xsl:value-of select="."/></b>
   </xsl:copy>
</xsl:template>

One useful template xsl:copy makes possible is the identity transformation; that is, a transformation from a document into itself. Such a transformation looks like this:

<xsl:template 
  match="*|@*|comment()|processing-instruction()|text()">
  <xsl:copy>
    <xsl:apply-templates
     select="*|@*|comment()|processing-instruction()|text()"/>
  </xsl:copy>
</xsl:template>

You can adjust the identity transformation a little to produce similar documents. For example, Listing 17-15 is a style sheet that strips comments from a document, leaving the document otherwise untouched. It resulted from leaving the comment() node out of the match and select attribute values in the identity transformation.

Listing 17-15: An XSLT style sheet that strips comments from a document

<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:template 
     match="*|@*|processing-instruction()|text()">
    <xsl:copy>
      <xsl:apply-templates
        select="*|@*|processing-instruction()|text()"/>
    </xsl:copy>
  </xsl:template>
</xsl:stylesheet>

xsl:copy only copies the source node. You can copy other nodes, possibly more than one of them, using xsl:copy-of. The select attribute of xsl:copy-of chooses the nodes to be copied. For example, Listing 17-16 is a style sheet that uses xsl:copy-of to strip out elements without melting points from the periodic table by copying only ATOM elements that have MELTING_POINT children.

Listing 17-16: A style sheet that copies only ATOM elements that have MELTING_POINT children

<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:template match="/PERIODIC_TABLE">
      <PERIODIC_TABLE>
        <xsl:apply-templates select="ATOM"/>
      </PERIODIC_TABLE>
    </xsl:template>
    <xsl:template match="ATOM">
      <xsl:apply-templates select="MELTING_POINT"/>
    </xsl:template>
   <xsl:template match="MELTING_POINT">
     <xsl:copy-of select=".."/>
  </xsl:template>
</xsl:stylesheet>

Note

Listings 17-15 and 17-16 are examples of XSL transformations from a source vocabulary to the same vocabulary. Unlike most of the examples in this chapter, they do not transform to well-formed HTML.

Counting Nodes with xsl:number

The xsl:number element inserts a formatted integer into the output document. The value of the integer is given by the value attribute. This contains a number, which is rounded to the nearest integer, then formatted according to the value of the format attribute. Reasonable defaults are provided for both these attributes. For example, consider the style sheet for the ATOM elements in Listing 17-17.

Listing 17-17: An XSLT style sheet that counts atoms

<?xml version="1.0"?>
<xsl:stylesheet version="1.0" 
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:template match="PERIODIC_TABLE">
      <html>
        <head><title>The Elements</title></head>
        <body>
          <table>
            <tr><xsl:apply-templates select="ATOM"/></tr>
          </table>
        </body>
      </html>
    </xsl:template>
    <xsl:template match="ATOM">
      <td><xsl:number value="ATOMIC_NUMBER"/></td>    
      <td><xsl:value-of select="NAME"/></td>
    </xsl:template>
</xsl:stylesheet>

When this style sheet is applied to Listing 17-1, the output appears like this:

<html>
<head>
<title>The Elements</title>
</head>
<body>
<table>
<tr>
<td>1</td><td>Hydrogen</td><td>2</td><td>Helium</td>
</tr>
</table>
</body>
</html>

Each element is matched with its atomic number. The value attribute can contain any data that XPath knows how to convert to a number. In this case, the ATOMIC_NUMBER child element of the matched ATOM is converted.

Default numbers

If you use the value attribute to calculate the number, that's all you need. However, if the value attribute is omitted, then the position of the current node in the source tree is used as the number. For example, consider Listing 17-18, which produces a table of atoms that have boiling points less than or equal to the boiling point of nitrogen.

Listing 17-18: An XSLT style sheet that counts atoms

<?xml version="1.0"?>
<xsl:stylesheet version="1.0" 
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:template match="PERIODIC_TABLE">
      <html>
        <head><title>The Elements</title></head>
        <body>
          <table>
            <tr>
              <td>Name</td>
              <td>Position</td>
              <td>Default Number</td>
              <td>Boiling Point</td>
            </tr>
            <xsl:apply-templates 
              select="ATOM[BOILING_POINT &lt;= 77.344]"/>
          </table>
        </body>
      </html>
    </xsl:template>
    <xsl:template match="ATOM">
      <tr>
        <td><xsl:value-of select="NAME"/></td>
        <td><xsl:number value="position()"/></td>
        <td><xsl:number/></td>
        <td><xsl:number value="BOILING_POINT"/></td>
      </tr>
    </xsl:template>
</xsl:stylesheet>

Figure 17-5 shows the finished table produced by applying this stylesheet to the complete periodic table. This shows that the default value calculated by xsl:number is the position of the node among other sibling nodes of the same type (ATOM elements in this case) . This is not the same as the number returned by the position() function, which only calculates position relative to other nodes in the context node list (the nodes which the template matched — hydrogen, helium, nitrogen, and neon in this example). You can change what xsl:number counts using these three attributes:

Image: 17/picture9.jpg

Figure 17-5: Atoms with boiling points less than or equal to nitrogen's

The level attribute

By default, with no value attribute, xsl:number counts siblings of the source node with the same type. For instance, if the ATOMIC_NUMBER elements were numbered instead of ATOM elements, none would have a number higher than 1 because an ATOM never has more than one ATOMIC_NUMBER child. Although the document contains more than one ATOMIC_NUMBER element, these are not siblings.

Setting the level attribute of xsl:number to any counts all of the elements of the same kind as the current node in the document. This includes not just the ones in the current node list, but all nodes of the same type. Even if you select only the atomic numbers of the gases, for example, the solids and liquids would still count, even if they weren't output. Consider these rules:

<xsl:template match="ATOM">
  <tr><xsl:apply-templates select="NAME"/></tr>
</xsl:template>
<xsl:template match="NAME">
  <td><xsl:number level="any"/></td>
  <td><xsl:value-of select="."/></td>
</xsl:template>

Because level is set to any, these templates produce output like this that doesn't start from 1 with each new NAME element:

<tr>
<td>1</td><td>Hydrogen</td>
</tr>
<tr>
<td>2</td><td>Helium</td>
</tr>

If you remove the level attribute or set it to its default value of single, then the output looks like this:

<tr>
<td>1</td><td>Hydrogen</td>
</tr>
<tr>
<td>1</td><td>Helium</td>
</tr>

A slightly less useful option sets the level attribute of xsl:number to multiple to specify that both the siblings of the current node and its ancestors (but not their children that aren't siblings of the current node) should be counted.

The count attribute

By default, with no value attribute, only elements of the same type as the element of the current node get counted. However, you can set the count attribute of xsl:number to an expression that specifies what to count. For instance, this rule matches all the child elements of an ATOM. It places a number in front of each one that represents its position among all the children of that ATOM.

<xsl:template match="ATOM/*">
  <td><xsl:number count="*"/></td>
  <td><xsl:value-of select="."/></td>
</xsl:template>

The output from this template looks like this:

    <td>1</td><td>Hydrogen</td>
    <td>2</td><td>H</td>
    <td>3</td><td>1</td>
    <td>4</td><td>1.00794</td>
    <td>5</td><td>20.28</td>
    <td>6</td><td>13.81</td>
    <td>7</td><td>
      0.0000899
    </td>
    <td>1</td><td>Helium</td>
    <td>2</td><td>He</td>
    <td>3</td><td>2</td>
    <td>4</td><td>4.0026</td>
    <td>5</td><td>4.216</td>
    <td>6</td><td>0.95</td>
    <td>7</td><td>
      0.0001785
    </td>

The from attribute

The from attribute contains an XPath expression that specifies which element the counting begins with in the input tree. However, the counting still begins from 1, not 2 or 10 or some other number. The from attribute only changes which element is considered to be the first element. This attribute is only considered when level="any". Other times it has no effect.

Number to string conversion

Until now, I've implicitly assumed that numbers looked like 1, 2, 3, and so on; that is, a European numeral starting from 1 and counting by 1. However, that's not the only possibility. For instance, the page numbers in the preface and other front matter of books often appear in small Roman numerals like i, ii, iii, iv, and so on. And different countries use different conventions to group the digits, separate the integer and fractional parts of a real number, and represent the symbols for the various digits. These are all adjustable through four attributes of xsl:number:

The format attribute

You can adjust the numbering style used by xsl:number using the format attribute. This attribute generally has one of the following values:

For example, this rule numbers the atoms with capital Roman numerals:

<xsl:template match="ATOM">
  <P>
    <xsl:number value="position()" format="I"/>
    <xsl:value-of select="."/>
  </P>
</xsl:template>

You can specify decimal numbering with leading zeroes by including the number of leading zeroes you want in the format attribute. For instance, setting format="01", produces the sequence 01, 02, 03, 04, 05, 06, 07, 08, 09, 10, 11, 12, . . . . You might find this useful when lining numbers up in columns.

The letter-value attribute

The letter-value attribute distinguishes between letters interpreted as numbers and letters interpreted as letters. For instance, if you want to use format="I" to start the sequence I, J, K, L, M, N, . . . instead of I, II, III, IV, V, VI, . . . you would set the letter-value attribute to the keyword alphabetic. The keyword traditional specifies a numeric sequence. For example,

<xsl:template match="ATOM">
  <P>
   <xsl:number value="position()"
               format="I" letter-value="alphabetic"/>
   <xsl:value-of select="."/>
  </P>
</xsl:template>

Grouping attributes

In the United States, we tend to write large numbers with commas grouping every three digits; for example, 4,567,302,000. However, in many languages and countries, a period or a space separates the groups instead; for instance, 4.567.302.000 or 4 567 302 000. Furthermore, in some countries it's customary to group large numbers every four digits instead of every three; for example, 4,5673,0000. If you're dealing with very long lists that may contain a thousand or more items, you need to worry about these issues.

The grouping-separator attribute specifies the grouping separator used between groups of digits. The grouping-size attribute specifies the number of digits used in a group. Generally, you'd make these attributes contingent on the language. For example,

<xsl:number grouping-separator=" " grouping-size="3"/>

Sorting Output Elements

The xsl:sort element sorts the output elements into a different order than they appear in the input. An xsl:sort element appears as a child of an xsl:apply-templates element or xsl:for-each element. The select attribute of the xsl:sort element defines the key used to sort the element's output by xsl:apply-templates or xsl:for-each.

By default, sorting is performed in alphabetical order of the keys. If more than one xsl:sort element is present in a given xsl:apply-templates or xsl:for-each element, then the elements are sorted first by the first key, then by the second key, and so on. If any elements still compare equally, they are output in the order they appear in the source document.

For example, suppose you have a file full of ATOM elements arranged alphabetically. To sort by atomic number, you can use the style sheet in Listing 17-19.

Listing 17-19: An XSLT style sheet that sorts by atomic number

<?xml version="1.0"?>
<xsl:stylesheet version="1.0" 
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:template match="PERIODIC_TABLE">
      <html>
        <head>
          <title>Atomic Number vs. Atomic Weight</title>
        </head>
        <body>
          <h1>Atomic Number vs. Atomic Weight</h1>
          <table>
            <th>Element</th>
            <th>Atomic Number</th>
            <th>Atomic Weight</th>
            <xsl:apply-templates>
              <xsl:sort select="ATOMIC_NUMBER"/>
            </xsl:apply-templates>
          </table>
        </body>
      </html>
    </xsl:template>
    <xsl:template match="ATOM">
      <tr>
        <td><xsl:apply-templates select="NAME"/></td>
        <td><xsl:apply-templates select="ATOMIC_NUMBER"/></td>
        <td><xsl:apply-templates select="ATOMIC_WEIGHT"/></td>
      </tr>
    </xsl:template>
</xsl:stylesheet>

Figure 17-6 shows the limits of alphabetical sorting. Hydrogen, atomic number 1, is the first element. However, the second element is not helium, atomic number 2, but rather neon, atomic number 10. Although 10 sorts after 9 numerically, alphabetically 10 falls before 2.

Image: 17/picture11.jpg

Figure 17-6: Atoms alphabetically sorted by atomic number

You can, however, adjust the order of the sort by setting the optional data-type attribute to the value number. For example,

<xsl:sort data-type="number" select="ATOMIC_NUMBER"/>

Figure 17-7 shows the elements sorted properly.

Image: 17/picture13.jpg

Figure 17-7: Atoms numerically sorted by atomic number

You can change the order of the sort from the default ascending order to descending by setting the order attribute to descending like this:

<xsl:sort order="descending"
          data-type="number"
          select="ATOMIC_NUMBER"/>

This sorts the elements from the largest atomic number to the smallest so that hydrogen now appears last in the list.

Alphabetical sorting naturally depends on the alphabet. The lang attribute can set the language of the keys. The value of this attribute should be an ISO 639 language code such as en for English. However, processors are not required to know how to sort in all the different languages that might be encountered in XML. While English sorting is fairly straight-forward, many other languages require much more complicated algorithms. Indeed a few languages actually have multiple standard ways of sorting based on different criteria. The lang attribute is ignored if data-type is number.

Cross-Reference

These are the same values supported by the xml:lang attribute discussed in Chapter 11.

Finally, you can set the case-order attribute to one of the two values upper-first or lower-first to specify whether uppercase letters sort before lowercase letters or vice versa. The default depends on the language.

Modes

Sometimes you want to include the same content from the source document in the output document multiple times. That's easy to do simply by applying templates multiple times, once in each place where you want the data to appear. However, suppose you want the data to be formatted differently in different locations? That's a little trickier.

For example, suppose you want the output of processing the periodic table to be a series of 100 links to more detailed descriptions of the individual atoms. In this case, the output document would start like this:

<UL>
<LI><A HREF="#Ac">Actinium</A></LI>
<LI><A HREF="#Al">Aluminum</A></LI>
<LI><A HREF="#Am">Americium</A></LI>
<LI><A HREF="#Sb">Antimony</A></LI>
<LI><A HREF="#Ar">Argon</A></LI>
. . .

Later in the document, the actual atom descriptions would appear, formatted like this:

<H3>
<A NAME="H">Hydrogen</A>
</H3>
<P>
    Hydrogen
    H
    1
    1.00794
    20.28
    13.81
      0.0000899
  </P>

This sort of application is common anytime you automatically generate a table of contents or an index. The NAME of the atom must be formatted differently in the table of contents than in the body of the document. You need two different rules that both apply to the ATOM element at different places in the document. The solution is to give each of the different rules a mode attribute. Then you can choose which template to apply by setting the mode attribute of the xsl:apply-templates element. Listing 17-20 demonstrates.

Listing 17-20: An XSLT style sheet that uses modes to format the same data differently in two different places

<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:template match="/PERIODIC_TABLE">
    <HTML>
      <HEAD><TITLE>The Elements</TITLE></HEAD>
      <BODY>
        <H2>Table of Contents</H2>
        <UL>
          <xsl:apply-templates select="ATOM" mode="toc"/>
        </UL>
        <H2>The Elements</H2>
        <xsl:apply-templates select="ATOM" mode="full"/>
      </BODY>
    </HTML>
  </xsl:template>
  <xsl:template match="ATOM" mode="toc">
    <LI><A>
      <xsl:attribute name="HREF">#<xsl:value-of
        select="SYMBOL"/></xsl:attribute>
      <xsl:value-of select="NAME"/>
    </A></LI>
  </xsl:template>
  <xsl:template match="ATOM" mode="full">
    <H3><A>
      <xsl:attribute name="NAME">
        <xsl:value-of select="SYMBOL"/>
      </xsl:attribute>
      <xsl:value-of select="NAME"/>
    </A></H3>
      <P>
        <xsl:value-of select="."/>
      </P>
  </xsl:template>
</xsl:stylesheet>

The default template rule for nodes preserves modes. That is, for every mode n you declare in your style sheet, the XSLT processor adds one template rule that applies specifically to that mode and looks like this:

<xsl:template match="*|/" mode="n">
  <xsl:apply-templates mode="n"/>
</xsl:template>

As usual, you are free to override this default rule with one of your own design.

Defining Constants with xsl:variable

Named constants help clean up code. They can replace commonly used boilerplate text with a simple name and reference. They can also make it easy to adjust boilerplate text that appears in multiple locations by simply changing the constant definition.

The xsl:variable element defines a named string for use elsewhere in the style sheet via an attribute value template. It has a single attribute, name, which provides a name by which the variable can be referred to. The contents of the xsl:variable element provide the replacement text. For example, this xsl:variable element defines a variable with the name copy01 and the value Copyright 2001 Elliotte Rusty Harold:

<xsl:variable name="copy01">
  Copyright 2001 Elliotte Rusty Harold
</xsl:variable>

To access the value of this variable, you prefix a dollar sign to the name of the variable. To insert this in an attribute, use an attribute value template. For example:

<BLOCK COPYRIGHT="{$copy01}">
</BLOCK>

You can use xsl:value-of to insert the variable's replacement text into the output document as text:

<xsl:value-of select="$copy01"/>

The contents of the xsl:variable can contain markup including other XSLT instructions. This means that you can calculate the value of a variable based on other information, including the value of other variables. However, a variable may not refer to itself recursively, either directly or indirectly. For instance, the following example is in error:

<xsl:variable name="GNU">
  <xsl:value-of select="$GNU"/>'s not Unix
</xsl:variable>

Similarly, two variables may not refer to each other in a circular fashion like this:

<xsl:variable name="Thing1">
  Thing1 loves <xsl:value-of select="$Thing2"/>
</xsl:variable>
<xsl:variable name="Thing2">
  Thing2 loves <xsl:value-of select="$Thing1"/>
</xsl:variable>

xsl:variable elements can either be top-level children of the xsl:stylesheet root element or they can be included inside template rules. A variable present at the top level of a style sheet can be accessed anywhere in the style sheet. It’s a global variable. By contrast, a variable that’s declared inside a template rule is only accessible by its following sibling elements and their descendants (the scope of the variable). It’s a local variable. Local variables override global variables with the same name. Local variables can also override other local variables. In the event of a conflict between two variables with the same name, the closest local variable with the same name is used.

Named Templates

Variables are limited to basic text and markup. XSLT provides a more powerful macro facility that can wrap standard markup and text around changing data. For example, suppose you want an atom's atomic number, atomic weight, and other key values formatted as a table cell in small, bold Times in blue. In other words, you want the output to look like this:

<td>
  <font face="Times, serif" color="blue" size="2">
    <b>52</b>
  </font>
</td>

You can certainly include all that in a template rule like this:

<xsl:template match="ATOMIC_NUMBER">
  <td>
    <font face="Times, serif" color="blue" size="2">
      <b>
        <xsl:value-of select="."/>
      </b>
    </font>
  </td>
</xsl:template>

This markup can be repeated inside other template rules. When the detailed markup grows more complex, and when it appears in several different places in a style sheet, you may elect to turn it into a named template. Named templates resemble variables. However, they enable you to include data from the place where the template is applied, rather than merely inserting fixed text.

The xsl:template element can have a name attribute by which it can be explicitly invoked, even when it isn't applied indirectly. For example, this shows a sample named template for the above pattern:

<xsl:template name="ATOM_CELL">
  <td>
    <font face="Times, serif" color="blue" size="2">
      <b>
        <xsl:value-of select="."/>
      </b>
    </font>
  </td>
</xsl:template>

The <xsl:value-of select="."/> element in the middle of the named template will be replaced by the contents of the current node from which this template was called.

The xsl:call-template element appears in the contents of a template rule. It has a required name argument that names the template it will call. When processed, the xsl:call-template element is replaced by the contents of the xsl:template element it names. For example, you can now rewrite the ATOMIC_NUMBER rule like this by using the xsl:call-template element to call the ATOM_CELL named template:

<xsl:template match="ATOMIC_NUMBER">
  <xsl:call-template name="ATOM_CELL"/>
</xsl:template>

This fairly simple example only saves a few lines of code, but the more complicated the template, and the more times it's reused, the greater the reduction in complexity of the style sheet. Named templates also have the advantage, like variables, of factoring out common patterns in the style sheet so that you can edit them as one. For instance, if you decide to change the color of atomic number, atomic weight, and other key values from blue to red, you only need to change it once in the named template. You do not have to change it in each separate template rule. This facilitates greater consistency of style.

Passing Parameters to Templates

Each separate invocation of a template can pass parameters to the template to customize its output. This is done the same way for named templates and unnamed templates. In the xsl:template element, the parameters are represented as xsl:param child elements. In xsl:call-template or xsl:apply-templates elements, parameters are represented as xsl:with-param child elements.

For example, suppose you want to also include a link to a particular file for each atom cell. The output should look something like this:

<td>
  <font face="Times, serif" color="blue" size="2">
    <b>
      <a href="atomic_number.html">52</a>
    </b>
  </font>
</td>

The trick is that the value of the href attribute has to be passed in from the point where the template is invoked because it changes for each separate invocation of the template. For example, atomic weights will have to be formatted like this:

<td>
  <font face="Times, serif" color="blue" size="2">
    <b>
      <a href="atomic_weight.html">4.0026</a>
    </b>
  </font>
</td>

The template that supports this looks like this:

<xsl:template name="ATOM_CELL">
  <xsl:param name="file">index.html</xsl:param>
  <td>
    <font face="Times, serif" color="blue" size="2">
      <b>
        <a href="{$file}"><xsl:value-of select="."/></a>
      </b>
    </font>
  </td>
</xsl:template>

The name attribute of the xsl:param element gives the parameter a name (important if there are multiple arguments) and the contents of the xsl:param element supplies a default value for this parameter to be used if the invocation doesn't provide a value. (This can also be given as a string expression by using a select attribute.)

When this template is called, an xsl:with-param child of the xsl:call-template element provides the value of the parameter using its name attribute to identify the parameter and its contents to provide a value for the parameter. For example:

<xsl:template match="ATOMIC_NUMBER">
  <xsl:call-template name="ATOM_CELL">
    <xsl:with-param 
                name="file">atomic_number.html</xsl:with-param>
  </xsl:call-template>
</xsl:template>

Again, this is a simple example. However, much more complex named templates exist. For instance, you could define header and footer templates for pages on a Web site for importing by many different style sheets, each of which would only have to change a few parameters for the name of the page author, the title of the page, and the copyright date.

Stripping and Preserving White Space

You may have noticed that most of the examples of output have been formatted a little strangely. The reason the examples appeared strange is that the source document needed to break long elements across multiple lines to fit between the margins of this book. Unfortunately, the extra white space added to the input document carried over into the output document. For a computer, the details of insignificant white space aren't important, but for a person they can be distracting.

The default behavior for text nodes read from the input document, such as the content of an ATOMIC_NUMBER or DENSITY element, is to preserve all white space. A typical DENSITY element looks like this:

    <DENSITY UNITS="grams/cubic centimeter">
      <!-- At 300K, 1 atm -->
      0.0000899
    </DENSITY>

When its value is taken the leading and trailing white space is included, like this, even though it's really only there to help fit on this printed page and isn't at all significant:

      0.0000899

You can use the normalize-space() function to strip the leading and trailing white space from this or any other string. For example, instead of writing <xsl:value-of select="DENSITY"/>, you would write <xsl:value-of select="normalize-space(DENSITY)"/>.

You can also automatically delete white-space only nodes in the input document by using xsl:strip-space. The elements attribute of this top-level element contains a list of elements from which text nodes that contain nothing but white space should be deleted. For example, this element says that nodes containing only white space should be stripped from DENSITY, NAME, SYMBOL, and BOILING_POINT elements:

<xsl:strip-space elements="DENSITY NAME SYMBOL BOILING_POINT"/>

You can strip space-only nodes in all elements by using the * wildcard, like this:

<xsl:strip-space elements="*"/>

There's also an xsl:preserve-space element with a similar syntax but opposite meaning. However, since preserving space is the default, this element isn’t much used. Its main purpose is to override xsl:strip-space elements imported from other style sheets or to specify a few elements where space is preserved when the default has been reset to stripping by <xsl:strip-space elements="*"/>.

White space only text nodes in the style sheet, as opposed to the input document, are another matter. They are stripped by default. If you want to preserve one, you attach an xml:space attribute with the value preserve to its parent element or to another one of its ancestors.

Cross-Reference

The xml:space attribute was discussed in Chapter 11.

Sometimes the easiest way to include significant white space in a style sheet is to wrap it in an xsl:text element. Space inside an xsl:text element is treated literally and never stripped.

Making Choices

XSLT provides two elements that allow you to change the output based on the input. The xsl:if element either does or does not output a given fragment of XML depending on what patterns are present in the input. The xsl:choose element picks one of several possible XML fragments, depending on what patterns are present in the input. Most of what you can do with xsl:if and xsl:choose can also be done by a suitable application of templates. However, sometimes the solution with xsl:if or xsl:choose is simpler and more obvious.

xsl:if

The xsl:if element provides a simple facility for changing the output based on a pattern. The test attribute of xsl:if contains an expression that evaluates to a boolean. If the expression is true, the contents of the xsl:if element are output. Otherwise, they're not. For example, this template writes out the names of all ATOM elements. A comma and a space is added after all except the last element in the list.

<xsl:template match="ATOM">
  <xsl:value-of select="NAME"/>
  <xsl:if test="position()!=last()">, </xsl:if>
</xsl:template>

This ensures that the list looks like "Hydrogen, Helium" and not "Hydrogen, Helium, ".

There are no xsl:else or xsl:else-if elements. The xsl:choose element provides this functionality.

xsl:choose

The xsl:choose element selects one of several possible outputs depending on several possible conditions. Each condition and its associated output template is provided by an xsl:when child element. The test attribute of the xsl:when element is an XPath expression with a boolean value. If multiple conditions are true, only the first true one is instantiated. If none of the xsl:when elements are true, the xsl:otherwise child element is instantiated. For example, this rule changes the color of the output based on whether the STATE attribute of the ATOM element is SOLID, LIQUID, or GAS:

<xsl:template match="ATOM">
  <xsl:choose>
    <xsl:when test="@STATE='SOLID'">
      <P style="color: black">
        <xsl:value-of select="."/>
      </P>
    </xsl:when>
    <xsl:when test="@STATE='LIQUID'">
      <P style="color: blue">
        <xsl:value-of select="."/>
      </P>
    </xsl:when>
    <xsl:when test="@STATE='GAS'">
      <P style="color: red">
        <xsl:value-of select="."/>
      </P>
    </xsl:when>
    <xsl:otherwise>
      <P style="color: green">
        <xsl:value-of select="."/>
      </P>
    </xsl:otherwise>
  </xsl:choose>
</xsl:template>

Merging Multiple Style Sheets

A single XML document may use many different markup vocabularies described in many different DTDs. You may wish to use different standard style sheets for those different vocabularies. However, you'll also want style rules for particular documents as well. The xsl:import and xsl:include elements enable you to merge multiple style sheets so that you can organize and reuse style sheets for different vocabularies and purposes.

Importing with xsl:import

The xsl:import element is a top-level element whose href attribute provides the URI of a style sheet to import. All xsl:import elements must appear before any other top-level element in the xsl:stylesheet root element. For example, these xsl:import elements import the style sheets genealogy.xsl and standards.xsl.

<xsl:stylesheet version="1.0" 
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:import href="genealogy.xsl"/>
  <xsl:import href="standards.xsl"/>
  <!-- other child elements follow -->
</xsl:stylesheet>

Rules in the imported style sheets may conflict with rules in the importing style sheet. If so, rules in the importing style sheet take precedence. If two rules in different imported style sheets conflict, then the rule in the last style sheet imported (standards.xsl above) takes precedence.

The xsl:apply-imports element is a slight variant of xsl:apply-templates that only uses imported rules. It does not use any rules from the importing style sheet. This allows access to imported rules that would otherwise be overridden by rules in the importing style sheet. Other than the name, it has identical syntax to xsl:apply-templates. The only behavioral difference is that it only matches template rules in imported style sheets.

Inclusion with xsl:include

The xsl:include element is a top-level element that copies another style sheet into the current style sheet at the point where it occurs. (More precisely, it copies the contents of the xsl-stylesheet or xsl:transform element in the remote document into the current document.) Its href attribute provides the URI of the style sheet to include. An xsl:include element can occur anywhere at the top level after the last xsl:import element.

Unlike rules included by xsl:import elements, rules included by xsl:include elements have the same precedence in the including style sheet that they would have if they were copied and pasted from one style sheet to the other. As far as the XSLT processor is concerned, there is no difference between an included rule and a rule that's physically present.

Embedding with xsl:stylesheet

You can directly include an XSLT style sheet in the XML document it applies to. I don't recommend this in practice, and browsers and XSLT processors are not required to support it. Nonetheless, a few do. To use this, the xsl:stylesheet element must appear as a child of the document element, rather than as a root element itself. It would have an id attribute giving it a unique name, and this id attribute would appear as the value of the href attribute in the xml-stylesheet processing instruction, following the fragment identifier separator #. Listing 17-21 demonstrates.

Listing 17-21: An XSLT style sheet embedded in an XML document

<?xml version="1.0"?>
<?xml-stylesheet type="text/xml" href="#mystyle"?>
<PERIODIC_TABLE>
  <xsl:stylesheet version="1.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    id="mystyle">
    <xsl:template match="/">
      <html>
        <xsl:apply-templates/>
      </html>
    </xsl:template>
    <xsl:template match="PERIODIC_TABLE">
      <xsl:apply-templates/>
    </xsl:template>
    <xsl:template match="ATOM">
      <P>
        <xsl:value-of select="."/>
      </P>
    </xsl:template>
    <!--Don't display the style sheet itself
       or its descendants-->
    <xsl:template match="xsl:stylesheet"/>
  </xsl:stylesheet>
  <ATOM>
    <NAME>Actinium</NAME>
    <ATOMIC_WEIGHT>227</ATOMIC_WEIGHT>
    <ATOMIC_NUMBER>89</ATOMIC_NUMBER>
    <OXIDATION_STATES>3</OXIDATION_STATES>
    <BOILING_POINT UNITS="Kelvin">3470</BOILING_POINT>
    <MELTING_POINT UNITS="Kelvin">1324</MELTING_POINT>
    <SYMBOL>Ac</SYMBOL>
    <DENSITY UNITS="grams/cubic centimeter"><!-- At 300K -->
      10.07
    </DENSITY>
    <ELECTRONEGATIVITY>1.1</ELECTRONEGATIVITY>
    <ATOMIC_RADIUS UNITS="Angstroms">1.88</ATOMIC_RADIUS>
  </ATOM>
</PERIODIC_TABLE>

Output Methods

Most of the examples in this chapter have focused on transforming XML into well-formed HTML. However, most XSLT processors actually support three different output methods:

The XSLT processor behaves differently depending on which of these output methods it uses. The XML format is the default and in many ways the simplest. The output is mostly exactly what you request in your style sheet. Because well-formed XML does not permit raw less-than signs and ampersands, if you use a character reference such as &#60; or the entity reference &lt; to insert the < character, the formatter will output &lt; or perhaps &#60;. If you use a character reference such as &#38; or the entity reference &amp; to insert the & character, the formatter will insert &amp; or perhaps &#38;. There are ways to disable this escaping, though, as you'll see later.

The HTML output method is designed to output standard HTML 4.0. This is not the well-formed HTML used in this book, but rather traditional HTML in which empty tags look like <HR> and <IMG> instead of <HR/> and <IMG/>, processing instructions are terminated with a > instead of ?>, and < signs used in JavaScript are not converted to &lt;. This makes it much easier to output HTML that works across many browsers and platforms without odd effects such as double lines where a single line is expected or other detritus caused by forcing HTML into the XML mold. The HTML output method is automatically selected when the formatter notices that the root output element is html, HTML, HtMl, or any other combination of case that still spells Hypertext Markup Language.

The final output method is pure text. The text output method operates by first forming a full result tree as per the XML output method, but then only outputting the string value of that tree. This is useful for transforming to non-XML formats such as RTF or TeX. The primary benefit of the text output format is that less than signs are not converted to &lt; or &#60; and ampersands are not converted to &amp; or &#38;. This allows you to output effectively arbitrary text.

xsl:output

By default an XSLT processor will use the XML output method, unless it recognizes the output root element as HTML, in which case it uses the HTML output method. You can change this by using a top-level xsl:output element. The method attribute of the xsl:output element specifies which output method to use and normally has one of these three values:

Formatting engines may support other values as well, but so far none do. For example, to specify that you want pure well-formed HTML as output, with all the empty tags properly indicated, all less than signs escaped, and so forth, you would use this xsl:output element at the top level of your style sheet:

<xsl:output method="xml"/>

To indicate that you want regular HTML output even though you aren't using an html root element, you'd put this xsl:output element at the top level of your style sheet:

<xsl:output method="html"/>

The xsl:output element also has a number of other allowed attributes that modify how XML is output. These allow you to change the prolog of the document, how the output is indented with insignificant white space, and which elements use CDATA sections rather than escaping < and & characters.

XML Declaration

Four attributes of xsl:output format the XML declaration used in your document. This assumes the output method is xml. These attributes are:

The omit-xml-declaration attribute has the value yes or no. If yes, then an XML declaration is not included in the output document. If no, then it is. For example, to insert a very basic <?xml version="1.0"?> XML declaration in the output document you would use this xsl:output element at the top level of your style sheet:

<xsl:output method="xml" omit-xml-declaration="no"/>

You could also include it as two separate xsl:output elements like this:

<xsl:output method="xml"/>
<xsl:output omit-xml-declaration="no"/>

The default value of the version attribute of the XML declaration is 1.0. Currently, that's the only value allowed. If at some point in the future that changes, then the version attribute of xsl:output will allow you to change the version used in the XML declaration. For example,

<xsl:output version="1.1"/>

You can set the standalone attribute of the XML declaration to the value yes or no using the standalone attribute of the xsl:output element. For example, this xsl:output element would insert the XML declaration <?xml version="1.0" standalone="yes"?>:

<xsl:output method="xml"
            omit-xml-declaration="no" standalone="yes"/>

The final possible piece of an XML declaration is the encoding declaration. As you probably guessed this can be set with the encoding attribute of the xsl:output element. The value can be any legal encoding name registered with the Internet Assigned Numbers Authority as discussed in Chapter 7. For example, to insert the XML declaration <?xml version="1.0" encoding="ISO-8859-1"?>, you'd use this xsl:output element:

<xsl:output method="xml"
          omit-xml-declaration="no" encoding="ISO-8859-1"/>

This also changes the encoding the XSLT processor uses for the output document from its default UTF-8. However, not all processors support all possible encodings. Those written in Java are likely to support the most encodings because Java's rich class library makes it almost trivial to support several dozen popular encodings.

Document type declaration

XSLT does not provide any elements for building an internal DTD subset for the output document with <!ELEMENT>, <!ATTLIST>, <!ENTITY>, and <!NOTATION> declarations. However, it does provide two attributes of the xsl:output element you can use to include a DOCTYPE declaration that points to an external DTD. These are doctype-system and doctype-public. The first inserts a SYSTEM identifier for the DTD; the second a PUBLIC identifier. For example, suppose you want this DOCTYPE declaration in your output document:

<!DOCTYPE PERIODIC_TABLE SYSTEM "chemistry.dtd">

Then you would use this xsl:output element at the top level of your style sheet:

<xsl:output doctype-system="chemistry.dtd"/>

The XSLT processor determines the proper root element for the document type declaration by looking at the root element of the output tree. Using a full URL instead of a relative URL is equally easy:

<xsl:output
  doctype-system="http://www.mysite.com/chemistry.dtd"/>

On the other hand, suppose you want this DOCTYPE declaration in your output document:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"
         "http://www.w3.org/TR/REC-html40/loose.dtd">

Then you would use both doctype-system and doctype-public attributes so your DOCTYPE declaration will have both a PUBLIC and a SYSTEM identifier. For example,

<xsl:output
  doctype-system="http://www.w3.org/TR/REC-html40/loose.dtd"
  doctype-public="-//W3C//DTD HTML 4.0 Transitional//EN"/>

Indentation

The indentation of most of the output examples in this chapter has been more than a little flaky. It's certainly not as neat as the carefully hand-coded indentation of the input documents. However, if white space isn't particularly significant in your output document, you can change this and ask the formatter for "pretty printed" XML with the nesting of different elements indicated by the indentation. This is accomplished by the indent attribute of the xsl:output element. If this attribute has the value yes (the default is no), then the processor is allowed (but not required) to insert (but not remove) extra white space into the output to try to "pretty print" the output. This may include indentation and line breaks. For example,

<xsl:output indent="yes"/>

You cannot, however, specify how much you want each level indented (for example, by two spaces or one tab). That's up to the formatter. Together, the xsl:strip-space and the indent attribute of the xsl:output element allow you to produce output that's almost as attractive as the most painstakingly hand-crafted XML.

CDATA sections

Standard XSLT does not allow you to insert CDATA sections at arbitrary locations in XML documents produced by XSL transformations. However, you can specify that the text contents of a particular element be placed in a CDATA section. In this case the < and & symbols are not encoded as &lt; and &amp; as they would normally be. To do this, place the name of the element whose text contents should be wrapped in CDATA delimiters in the cdata-section-elements attribute of the xsl:output element. For example, this xsl:output element says that the contents of the SCRIPT element should be wrapped in a CDATA section:

<xsl:output cdata-section-elements="SCRIPT"/>

You can enclose multiple names of elements whose text contents should be wrapped in CDATA delimiters in one cdata-section-elements attribute simply by separating the names with white space. For example, this xsl:output element says that the contents of both the SCRIPT and CODE elements should be wrapped in a CDATA section:

<xsl:output cdata-section-elements="SCRIPT CODE"/>

Alternately, you can just use multiple xsl:output elements, each naming one element. For example:

<xsl:output cdata-section-elements="SCRIPT"/>
<xsl:output cdata-section-elements="CODE"/>

Media type

One final xsl:output attribute specifies the MIME media type of the output document. This is media-type. Mostly this will have the value text/xml, but it might be text/html for the HTML output method, text/plain for the text output method, or even something else such as text/rtf. You should not specify a charset parameter for the media type. The formatting engine should determine this from the encoding attribute of the xsl:output element. For example, this xsl:output element specifies that the output encoding uses the text/rtf MIME type:

<xsl:output media-type="text/rtf"/>

Depending on external context, this may determine the filename extension, the icon of the file, how an HTTP server handles the file, or something else. Then again, it might have no effect at all. The XSLT processor might ignore this request and output the same byte stream or XML tree regardless of media type. This is something that's important to the environment in which the XML document exists, but not so important to the XML document itself.

Summary

In this chapter, you learned about XSL transformations. In particular, you learned that:

The next chapter takes up the second half of XSL: the formatting objects vocabulary. Formatting objects is an extremely powerful way of specifying the precise layout you want your pages to have. XSL transformations are used to transform an XML document into an XSL formatting object document.


[ Cafe con Leche | XML Bible Home Page | Order from amazon.com ]

Copyright 2001, 2002 Elliotte Rusty Harold
elharo@metalab.unc.edu
Last Modified December 31, 2002