The Extensible Stylesheet Language (XSL) includes both a transformation language and a formatting language. Each of these, naturally enough, is an XML application. The transformation language provides elements that define rules for how one XML document is transformed into another XML document. The transformed XML document may use the markup and DTD of the original document, or it may use a completely different set of elements. In particular, it may use the elements defined by the second part of XSL, the formatting objects. This chapter discusses the transformation language half of XSL.
The transformation and formatting halves of XSL can function independently of each other. For instance, the transformation language can transform an XML document into a well-formed HTML file, and completely ignore XSL formatting objects. This is the style of XSL previewed in Chapter 5 and emphasized in this chapter. Furthermore, it's not absolutely required that a document written in XSL formatting objects be produced by using the transformation part of XSL on another XML document. For example, it's easy to imagine a converter written in Java that reads TeX or PDF files and translates them into XSL formatting objects (though no such converters exist as of early 2001).
In essence, XSL is two languages, not one. The first language is a transformation language, the second a formatting language. The transformation language is useful independent of the formatting language. Its ability to move data from one XML representation to another makes it an important component of XML-based electronic commerce, electronic data interchange, metadata exchange, and any application that needs to convert between different XML representations of the same data. These uses are also united by their lack of concern with rendering data on a display for humans to read. They are purely about moving data from one computer system or program to another.
Consequently, many early implementations of XSL focus exclusively on the transformation part and ignore the formatting objects. These are incomplete implementations, but nonetheless useful. Not all data must ultimately be rendered on a computer monitor or printed on paper.
Cross-Reference
Chapter 18 discusses the XSL formatting language.
A Word of Caution about XSL
XSL is still under development. The language has changed radically in the past, and will almost certainly change again in the future. This chapter is based on the November 16, 1999 XSLT 1.0 Recommendation. Because XSLT is now an official Recommendation of the World Wide Web Consortium (W3C), I'm hopeful that any changes that do occur will simply add to the existing syntax without invalidating style sheets that adhere to the 1.0 spec. Indeed the W3C has just begun work on XSLT 1.1 and 2.0, and it does seem likely that all legal XSLT 1.0 documents will still be legal XSLT 1.1 and 2.0 documents.
Not all software has caught up to the 1.0 Recommendation, however. In particular, Version 5.5 and earlier of Internet Explorer only implement a very old working draft of XSLT that looks almost nothing like the finished standard. You should not expect most of the examples in this chapter to work with IE, even after substantial tweaking. Conversely, the language that IE does implement is not XSLT; and any book or person that tells you otherwise is telling you an untruth. Both Microsoft's live presentations and the written documentation it posts on its Web site are notorious for teaching nonstandard Microsoft versions of XSLT (and other languages) without clearly distinguishing which parts are real XSLT and which are Microsoft extensions to (some would say perversions of) standard XSLT.
In November 2000 Microsoft released MSXML 3.0, an XML parser/XSLT processor for IE that does come much closer to supporting
XSLT 1.0. You can download it from http://msdn.microsoft.com/xml/general/xmlparser.asp. However, there are still some bugs and areas where Microsoft did not follow the specification, so this is not quite a complete
implementation of XSLT 1.0. More importantly, MSXML 3.0 is not bundled with IE5.5; and even if you install it, it does not
automatically replace the earlier, non-standard-compliant version of MSXML that is bundled. To replace the old version, you
have to download and run a separate program called xmlinst.exe, which you can get from the same page where you found MSXML
3.0.
In an XSL transformation, an XSLT processor reads both an XML document and an XSLT style sheet. Based on the instructions the processor finds in the XSLT style sheet, it outputs a new XML document or fragment thereof. There's also special support for outputting HTML. With some effort most XSLT processors can also be made to output essentially arbitrary text, though XSLT is designed primarily for XML-to-XML and XML-to-HTML transformations.
As you learned in Chapter 6, every well-formed XML document is a tree. A tree is a data structure composed of connected nodes beginning with a top node called the root. The root is connected to its child nodes, each of which is connected to zero or more children of its own, and so forth. Nodes that have no children of their own are called leaves. A diagram of a tree looks much like a genealogical descendant chart that lists the descendants of a single ancestor. The most useful property of a tree is that each node and its children also form a tree. Thus, a tree is a hierarchical structure of trees in which each tree is built out of smaller trees.
For the purposes of XSLT, elements, attributes, namespaces, processing instructions, and comments are counted as nodes. Furthermore, the root of the document must be distinguished from the root element. Thus, XSLT processors model an XML document as a tree that contains seven kinds of nodes:
The Document Type Definition (DTD) and document type declaration are specifically not included in this tree. However, a DTD may add default attribute values to some elements, which then become additional attribute nodes in the tree.
For example, consider the XML document in Listing 17-1. This shows part of the periodic table of the elements. I’ll be using this as an example in this chapter.
On the CD-ROM
The complete periodic table appears on the CD-ROM in the file allelements.xml in the examples/periodic_table directory.
The root PERIODIC_TABLE element contains ATOM child elements. Each ATOM element contains several child elements providing the atomic number, atomic weight, symbol, boiling point, and so forth.
A UNITS attribute specifies the units for those elements that have units.
Note
ELEMENT would be a more appropriate name here than ATOM. However, writing about ELEMENT elements and trying to distinguish between chemical elements and XML elements might create confusion. Thus, at least for
the purposes of this chapter, ATOM seemed like the more legible option.
Listing 17-1: An XML periodic table with two atoms: hydrogen and helium
<?xml version="1.0"?>
<?xml-stylesheet type="text/xml" href="17-2.xsl"?>
<PERIODIC_TABLE>
<ATOM STATE="GAS">
<NAME>Hydrogen</NAME>
<SYMBOL>H</SYMBOL>
<ATOMIC_NUMBER>1</ATOMIC_NUMBER>
<ATOMIC_WEIGHT>1.00794</ATOMIC_WEIGHT>
<BOILING_POINT UNITS="Kelvin">20.28</BOILING_POINT>
<MELTING_POINT UNITS="Kelvin">13.81</MELTING_POINT>
<DENSITY UNITS="grams/cubic centimeter">
<!-- At 300K, 1 atm -->
0.0000899
</DENSITY>
</ATOM>
<ATOM STATE="GAS">
<NAME>Helium</NAME>
<SYMBOL>He</SYMBOL>
<ATOMIC_NUMBER>2</ATOMIC_NUMBER>
<ATOMIC_WEIGHT>4.0026</ATOMIC_WEIGHT>
<BOILING_POINT UNITS="Kelvin">4.216</BOILING_POINT>
<MELTING_POINT UNITS="Kelvin">0.95</MELTING_POINT>
<DENSITY UNITS="grams/cubic centimeter"><!-- At 300K -->
0.0001785
</DENSITY>
</ATOM>
</PERIODIC_TABLE>
Figure 17-1 displays a tree diagram of this document. It begins at the top with the root node (not the same as the root element!)
which contains two child nodes, the xml-stylesheet processing instruction and the root element PERIODIC_TABLE. (The XML declaration is not visible to the XSLT processor and is not included in the tree the XSLT processor operates on.)
The PERIODIC_TABLE element contains two child nodes, both ATOM elements. Each ATOM element has an attribute node for its STATE attribute, and a variety of child element nodes. Each child element contains a node for its contents, as well as nodes for
any attributes, comments and processing instructions it possesses. Notice in particular that many nodes are something other
than elements. There are nodes for text, attributes, comments, namespaces and processing instructions. Unlike CSS, XSL is
not limited to working only with whole elements. It has a much more granular view of a document that enables you to base styles
on comments, attributes, processing instructions, element content, and more.
Note
Like the XML declaration, an internal DTD subset or DOCTYPE declaration is not part of the tree. However, it may have the
effect of adding attribute nodes to some elements through <!ATTLIST> declarations that use #FIXED or default attribute values.

Figure 17-1: Listing 17-1 as a tree diagram
XSLT operates by transforming one XML tree into another XML tree. More precisely, an XSLT processor accepts as input a tree represented as an XML document and produces as output a new tree, also represented as an XML document. Consequently, the transformation part of XSL is also called the tree construction part. The XSL transformation language contains operators for selecting nodes from the tree, reordering the nodes, and outputting nodes. If one of these nodes is an element node, then it may be an entire tree itself. Remember that all these operators, both for input and output, are designed for operation on a tree.
The input must be an XML document. You cannot use XSLT to transform from non-XML formats such as PDF, TeX, Microsoft Word, PostScript, MIDI, or others. HTML and SGML are borderline cases because they're so close to XML. XSLT can work with HTML and SGML documents that satisfy XML's well-formedness rules. However, XSLT cannot handle the wide variety of non-well-formed HTML and SGML that you encounter on most Web sites and document production systems. XSLT is not a general-purpose regular expression language for transforming arbitrary data.
Most of the time the output of an XSLT transformation is also an XML document. However, it can also be a result tree fragment that could be used as an external parsed entity in another XML document. (That is, it would be a well-formed XML document if it were enclosed in a single root element.) In other words, the output may not necessarily be a well-formed XML document, but it will at least be a plausible part of a well-formed XML document. An XSLT transformation cannot output text that is malformed XML such as
<B><I>Tag Mismatch!</B></I>
Tip
The xsl:output element and disable-output-escaping attribute discussed below loosen this restriction somewhat.
Most XSLT processors also support output as HTML and/or raw text, although the standard does not require them to do so. To
some extent this allows you to transform to non-XML formats like TeX, RTF, or PostScript. However XSLT is not designed to
make these transformations easy. It is designed for XML-to-XML transformations. If you need a non-XML output format, it will
probably be easier to use XSLT to transform the XML to an intermediate format like TeXML (http://www.alphaworks.ibm.com/tech/texml), and then use additional, non-XSLT software to transform that into the format you want.
An XSLT document contains template rules. A template rule has a pattern specifying the nodes it matches and a template to be instantiated and output when the pattern is matched. When an XSLT processor transforms an XML document using an XSL style sheet, it walks the XML document tree, looking at each node in turn. As each node in the XML document is read, the processor compares it with the pattern of each template rule in the style sheet. When the processor finds a node that matches a template rule's pattern, it outputs the rule's template. This template generally includes some markup, some new data, and some data copied out of the source XML document.
XSLT uses XML to describe these rules, templates, and patterns. The root element of the XSLT document is either a stylesheet or a transform element in the http://www.w3.org/1999/XSL/Transform namespace. By convention this namespace is mapped to the xsl prefix, but you're free to pick another prefix if you prefer. In this chapter, I always use the xsl prefix. From this point forward it should be understood that the prefix xsl is mapped to the http://www.w3.org/1999/XSL/Transform namespace.
Tip
If you get the namespace URI wrong, either by using a URI from an older draft of the specification, such as http://www.w3.org/TR/WD-xsl, or simply by making a typo in the normal URI, the XSLT processor will output the style sheet document itself instead of
the transformed input document. This is the result of the interaction between several obscure sections of the XSLT 1.0 specification.
The details aren’t important. What is important is that this very unusual behavior looks very much like a bug in the processor
if you aren’t familiar with it. If you are familiar with it, fixing it is trivial; just correct the namespace URI to http://www.w3.org/1999/XSL/Transform.
Each template rule is an xsl:template element. The pattern of the rule is placed in the match attribute of the xsl:template element. The output template is the content of the xsl:template element. All instructions in the template for doing things such as selecting parts of the input tree to include in the output
tree are performed by one or another XSLT elements. These are identified by the xsl: prefix on the element names. Elements that do not have an xsl: prefix are part of the result tree.
Listing 17-2 shows a very simple XSLT style sheet with two template rules. The first template rule matches the root element
PERIODIC_TABLE. It replaces this element with an html element. The contents of the html element are the results of applying the other templates in the document to the contents of the PERIODIC_TABLE element.
The second template matches ATOM elements. It replaces each ATOM element in the input document with a P element in the output document. The xsl:apply-templates rule inserts the text of the matched source element into the output document. Thus, the contents of a P element will be the text (but not the markup) contained in the corresponding ATOM element.
The xsl:stylesheet root element has two required attributes, version and xmlns:xsl, each of which must have exactly the values shown here (1.0 for version and http://www.w3.org/1999/XSL/Transform for xmlns:xsl). I'll discuss the exact syntax of all these elements and attributes below.
Listing 17-2: An XSLT style sheet for the periodic table with two template rules
<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="PERIODIC_TABLE">
<html>
<xsl:apply-templates/>
</html>
</xsl:template>
<xsl:template match="ATOM">
<P>
<xsl:apply-templates/>
</P>
</xsl:template>
</xsl:stylesheet>
The xsl:transform element can be used in place of xsl:stylesheet if you prefer. This is an exact synonym with the same syntax, semantics, and attributes. For example,
<?xml version="1.0"?>
<xsl:transform version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<!-- templates go here -->
</xsl:transform>
In this book, I will stick to xsl:stylesheet.
There are three primary ways to transform XML documents into other formats, such as HTML, with an XSLT style sheet:
Each of these three approaches uses different software, although they all use the same XML documents and XSLT style sheets.
An ordinary Web server sending XML documents to Internet Explorer is an example of the first approach. A servlet-compatible
Web server using the IBM alphaWorks' XML Enabler (http://www.alphaworks.ibm.com/tech/xmlenabler) is an example of the second approach. A human using Michael Kay's command line SAXON program (http://users.iclway.co.uk/mhkay/saxon/) to transform XML documents to HTML documents, then placing the HTML documents on a Web server is an example of the third
approach. However, these all use (at least in theory) the same XSLT language.
In this chapter, I emphasize the third approach, primarily because at the time of this writing, specialized converter programs
such as Michael Kay's SAXON and the XML Apache Project's Xalan (http://xml.apache.org/xalan/) provide the most complete and accurate implementations of the XSLT specification. Furthermore, this approach offers the
broadest compatibility with legacy Web browsers and servers, whereas the first approach requires a more recent browser than
most users use, and the second approach requires special Web server software. In practice, though, requiring a different server
is not nearly as onerous as requiring a particular client. You, yourself, can install your own special server software; but
you cannot rely on your visitors to install particular client software.
On the CD-ROM
Xalan is on the CD-ROM in the directory utilities/xalan. SAXON is on the CD-ROM in the directory utilities/saxon.
Xalan is a Java 1.1 character mode application. To use it, you'll need a Java 1.1-compatible virtual machine such as Sun's
Java Development Kit (JDK), or Java Runtime Environment (JRE), Apple's Macintosh Runtime for Java 2.2 (MRJ), or Microsoft's
virtual machine. You'll need to set your CLASSPATH environment variable to include both the xalan.jar and xerces.jar files (both included in the Xalan distribution). On Unix/Linux
you can set this in your .cshrc file if you use csh or tcsh or in your .profile file if you use sh, ksh or bash. On Windows
95/98 you can set it in AUTOEXEC.BAT. In Windows NT/2000, set it with the System Control Panel Environment tab.
Tip
If you're using the JRE 1.2 or later, you can just put the xalan.jar and xerces.jar files in your jre/lib/ext directory instead
of mucking around with the CLASSPATH environment variable. If you've installed the JDK instead of the JRE on Windows, you may have two jre/lib/ext directories,
one somewhere like C:\jdk1.3\jre\lib\ext and the other somewhere like C:\Program Files\Javasoft\jre\1.3\lib\ext. You need
to copy the jar archive into both ext directories. Putting one copy in one directory and an alias into the other directory
does not work. You must place complete, actual copies into each ext directory.
Note
Although I primarily use Xalan in this chapter, the examples should work with SAXON or any other XSLT processor that implements the November 16, 1999 XSLT 1.0 recommendation.
The Java class containing the main method for Xalan is org.apache.xalan.xslt.Process. You can run Xalan by typing the following at the shell prompt or in a DOS window:
C:\> java org.apache.xalan.xslt.Process -in 17-1.xml -xsl 17-2.xsl -out 17-3.html
This line runs the java interpreter on the Java class containing the Xalan program's main() method, org.apache.xalan.xslt.Process. The source XML document following the -in flag is 17-1.xml. The XSLT style sheet follows the -xsl flag and is 17-2.xsl here;
and the output HTML file follows the -out argument and is named 17-3.html. If the -out argument is omitted, the transformed
document will be printed on the console. If the -xsl argument is omitted, Xalan will attempt to use the style sheet named
by the xml-stylesheet processing instruction in the prolog of the input XML document.
Listing 17-2 transforms input documents to well-formed HTML files as discussed in Chapter 6. However, you can transform from any XML application to any other as long as you can write a style sheet to support the transformation. For example, you can imagine a style sheet that transforms from Vector Markup Language (VML) documents to Scalable Vector Graphics (SVG) documents:
% java org.apache.xalan.xslt.Process -in pinktriangle.vml
-xsl VmlToSVG.xsl -out pinktriangle.svg
Most other command line XSLT processors behave similarly, though of course they'll have different command line arguments and
options. They may prove slightly easier to use if they're not written in Java since there won't be any need to configure the
CLASSPATH.
Tip
If you're using Windows, you can use a stand-alone executable version of SAXON called Instant SAXON (http://users.iclway.co.uk/mhkay/saxon/instant.html) instead. This is a little easier to use because it doesn't require you to mess around with CLASSPATH environment variables. To transform a document with this program, simply place the saxon.exe file in your path and type:
C:\> saxon -o 17-3.html 17-1.xml 17-2.xsl
Listing 17-3 shows the output of running Listing 17-1 through Xalan with the XSLT style sheet in Listing 17-2. Notice that Xalan does not attempt to clean up the HTML it generates, which has a lot of white space. This is not important since ultimately you want to view the file in a Web browser that trims white space. Figure 17-2 shows Listing 17-3 loaded into Netscape Navigator 4.6. Because Listing 17-3 is standard HTML, you don't need an XML-capable browser to view it.
Listing 17-3: The HTML produced by applying the style sheet in Listing 17-2 to the XML in Listing 17-1
<html>
<P>
Hydrogen
H
1
1.00794
20.28
13.81
0.0000899
</P>
<P>
Helium
He
2
4.0026
4.216
0.95
0.0001785
</P>
</html>

Figure 17-2: The page produced by applying the style sheet in Listing 17-2 to the XML document in Listing 17-1.
Instead of preprocessing the XML file, you can send the client both the XML file and the XSLT file that describes how to render it. The client is responsible for applying the style sheet to the document and rendering it accordingly. This is more work for the client, but places much less load on the server. In this case, the XSLT style sheet must transform the document into an XML application the client understands. HTML is a likely choice, though in the future some browsers may understand XSL formatting objects as well.
Attaching an XSLT style sheet to an XML document is easy. Simply insert an xml-stylesheet processing instruction in the prolog immediately after the XML declaration. This processing instruction should have a type attribute with the value text/xml and an href attribute whose value is a URL pointing to the style sheet. For example:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xml" href="17-2.xsl"?>
This is also how you attach a CSS style sheet to a document. The only difference here is that the type attribute has the value text/xml instead of text/css.
Note
In the future the more specific MIME media type application/xslt+xml will be available to distinguish XSLT documents from all other XML documents. Once XSLT processors are revised to support
this, you will be able to write the xml-stylesheet processing instruction like this instead:
<?xml-stylesheet type="application/xslt+xml" href="17-2.xsl"?>
Internet Explorer 5.0 and 5.5's XSLT support differs from the November 16, 1999 recommendation in several ways. First, it
expects that XSLT elements live in the http://www.w3.org/TR/WD-xsl namespace instead of the http://www.w3.org/1999/XSL/Transform namespace, although the xsl prefix is still used. Second, it expects the non-standard MIME type text/xsl in the xml-stylesheet processing instruction rather than text/xml. Finally, it does not implement the default rules for elements that match no template. Consequently, you need to provide
a template for each element in the hierarchy starting from the root before trying to view a document in Internet Explorer.
Listing 17-4 demonstrates. The three rules match the root node, the root element PERIODIC_TABLE, and the ATOM elements in that order. Figure 17-3 shows the XML document in Listing 17-1 loaded into Internet Explorer 5.5 with this style
sheet.
Listing 17-4: The style sheet of Listing 17-2 adjusted to work with Internet Explorer 5.0 and 5.5
<?xml version="1.0"?>
<!-- This is a non-standard style sheet designed just for
Internet Explorer. It will not work with any standards
compliant XSLT processor. -->
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/TR/WD-xsl">
<xsl:template match="/">
<html>
<xsl:apply-templates/>
</html>
</xsl:template>
<xsl:template match="PERIODIC_TABLE">
<xsl:apply-templates/>
</xsl:template>
<xsl:template match="ATOM">
<P>
<xsl:value-of select="."/>
</P>
</xsl:template>
</xsl:stylesheet>
Caution
Ideally, you would use the same XML document both for direct display and for prerendering to HTML. Unfortunately, that would require Microsoft to actually support the real XSLT specification. Microsoft has repeatedly promised to support this, and they have just as repeatedly reneged on those promises.

Figure 17-3: The page produced in Internet Explorer 5.5 by applying the style sheet in Listing 17-4 to the XML document in Listing 17-1.
Internet Explorer also fails to support many other parts of standard XSLT, while offering a number of nonstandard extensions.
If you've successfully installed MSXML3 in replace mode, then IE5 can handle most of XSLT 1.0 including the http://www.w3.org/1999/XSL/Transform namespace. However, even this version still has a few bugs, including expecting the text/xsl MIME type instead of text/xml.
In the rest of this chapter, I use only standard XSLT and simply prerender the file in HTML before loading it into a Web browser.
If you find something in this chapter doesn’t work in Internet Explorer, please complain to Microsoft, not to me.
Template rules defined by xsl:template elements are the most important part of an XSLT style sheet. These associate particular output with particular input. Each
xsl:template element has a match attribute that specifies which nodes of the input document the template is instantiated for.
The content of the xsl:template element is the actual template to be instantiated. A template may contain both text that will appear literally in the output
document and XSLT instructions that copy data from the input XML document to the result. Because all XSLT instructions are
in the http://www.w3.org/1999/XSL/Transform namespace, it's easy to distinguish between the elements that are literal data to be copied to the output and instructions.
For example, here is a template that is applied to the root node of the input tree:
<xsl:template match="/">
<html>
<head>
</head>
<body>
</body>
</html>
</xsl:template>
When the XSLT processor reads the input document, the first node it sees is the root. This rule matches that root node, and tells the XSLT processor to emit this text:
<html>
<head>
</head>
<body>
</body>
</html>
This text is well-formed HTML. Because the XSLT document is itself an XML document, its contents — templates included — must be well-formed XML.
If you were to use the above rule, and only the above rule, in an XSLT style sheet, the output would be limited to the above six tags. That's because no instructions in the rule tell the formatter to move down the tree and look for further matches against the templates in the style sheet.
To get beyond the root, you have to tell the formatting engine to process the children of the root. In general, to include
content in the child nodes, you have to recursively process the nodes through the XML document. The element that does this
is xsl:apply-templates. By including xsl:apply-templates in the output template, you tell the formatter to compare each child element of the matched source element against the templates
in the style sheet, and, if a match is found, output the template for the matched node. The template for the matched node
may itself contain xsl:apply-templates elements to search for matches for its children. When the formatting engine processes a node, the node is treated as a complete
tree. This is the advantage of the tree structure. Each part can be treated the same way as the whole. For example, Listing
17-5 is an XSLT style sheet that uses the xsl:apply templates element to process the child nodes.
Listing 17-5: An XSLT style sheet that recursively processes the children of the root
<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<html>
<xsl:apply-templates/>
</html>
</xsl:template>
<xsl:template match="PERIODIC_TABLE">
<body>
<xsl:apply-templates/>
</body>
</xsl:template>
<xsl:template match="ATOM">
An Atom
</xsl:template>
</xsl:stylesheet>
When this style sheet is applied to Listing 17-1, here's what happens:
<html> tag is written out.
xsl:apply-templates element causes the formatting engine to process the child nodes of the root node of the input document.
A. The first child of the root, the xml-stylesheet processing instruction, is compared with the template rules. It doesn't match any of them, so no output is generated.
B. The second child of the root node of the input document, the root element PERIODIC_TABLE, is compared with the template rules. It matches the second template rule.
C. The <body> tag is written out.
D. The xsl:apply-templates element in the body element causes the formatting engine to process the child nodes of PERIODIC_TABLE.
a. The first child of the PERIODIC_TABLE element, that is the Hydrogen ATOM element, is compared with the template rules. It matches the third template rule.
b. The text "An Atom" is output.
c. The second child of the PERIODIC_TABLE element, that is the Helium ATOM element, is compared with the template rules. It matches the third template rule.
d. The text "An Atom" is output.
E. The </body> tag is written out.
</html> tag is written out.
The end result is:
<html>
<body>
An Atom
An Atom
</body>
</html>
To replace the text "An Atom" with the name of the ATOM element as given by its NAME child, you need to specify that templates should be applied to the NAME children of the ATOM element. To choose a particular set of children instead of all children you supply xsl:apply-templates with a select attribute designating the children to be selected. For example:
<xsl:template match="ATOM">
<xsl:apply-templates select="NAME"/>
</xsl:template>
The select attribute uses the same kind of patterns as the match attribute of the xsl:template element. For now, I'll stick to simple names of elements; but in the section on patterns for matching and selecting later
in this chapter, you'll see many more possibilities for both select and match. If no select attribute is present, all child element, text, comment, and processing instruction nodes are selected. (Attribute and namespace
nodes are not selected.)
The result of adding this rule to the style sheet of Listing 17-5 and applying it to Listing 17-1 is this:
<html>
<body>
Hydrogen
Helium
</body>
</html>
The xsl:value-of element computes the value of something (most of the time, though not always, something in the input document) and copies
it into the output document. The select attribute of the xsl:value-of element specifies exactly which something's value is being computed.
For example, suppose you want to replace the literal text An Atom with the name of the ATOM element as given by the contents of its NAME child. You can replace An Atom with <xsl:value-of select="NAME"/> like this:
<xsl:template match="ATOM">
<xsl:value-of select="NAME"/>
</xsl:template>
Then, when you apply the style sheet to Listing 17-1, this text is generated:
<html>
<body>
Hydrogen
Helium
</body>
</html>
The item whose value is selected, the NAME element in this example, is relative to the current node. The current node is the item matched by the template, the particular
ATOM element in this example. Thus, when the Hydrogen ATOM is matched by <xsl:template match="ATOM">, the Hydrogen ATOM's NAME is selected by xsl:value-of. When the Helium ATOM is matched by <xsl:template match="ATOM">, the Helium ATOM's NAME is selected by xsl:value-of.
The value of a node is always a string, possibly an empty string. The exact contents of this string depend on the type of
the node. The most common type of node is element, and the value of an element node is particularly simple. It's the concatenation
of all the character data (but not markup!) between the element's start tag and end tag. For example, the first ATOM element in Listing 17-1 is as follows:
<ATOM STATE="GAS">
<NAME>Hydrogen</NAME>
<SYMBOL>H</SYMBOL>
<ATOMIC_NUMBER>1</ATOMIC_NUMBER>
<ATOMIC_WEIGHT>1.00794</ATOMIC_WEIGHT>
<BOILING_POINT UNITS="Kelvin">20.28</BOILING_POINT>
<MELTING_POINT UNITS="Kelvin">13.81</MELTING_POINT>
<DENSITY UNITS="grams/cubic centimeter">
<!-- At 300K, 1 atm -->
0.0000899
</DENSITY>
</ATOM>
The value of this element is shown below:
Hydrogen
H
1
1.00794
1
20.28
13.81
0.0000899
I calculated this value by stripping out all the tags and comments. Everything else including white space was left intact. The values of the other six node types are calculated similarly, mostly in obvious ways. Table 17-1 summarizes.
Table 17-1: Values of Nodes
|
Node Type: |
Value: |
|
Root |
The value of the root element |
|
Element |
The concatenation of all parsed character data contained in the element, including character data in any of the descendants of the element |
|
Text |
The text of the node; essentially the node itself |
|
Attribute |
The normalized attribute value as specified by Section 3.3.3 of the XML 1.0 recommendation; basically the attribute value after entities are resolved and leading and trailing white space is stripped; does not include the name of the attribute, the equals sign, or the quotation marks |
|
Namespace |
The URI of the namespace |
|
Processing instruction |
The data in the processing instruction; does not include the processing instruction , |
|
Comment |
The text of the comment, |
The xsl:value-of element should only be used in contexts where it is obvious which node's value is being taken. If there are multiple possible
items that could be selected, then only the first one will be chosen. For instance, this is a poor rule because a typical
PERIODIC_TABLE element contains more than one ATOM:
<xsl:template match="PERIODIC_TABLE">
<xsl:value-of select="ATOM"/>
</xsl:template>
There are two ways of processing multiple elements in turn. The first method you've already seen. Simply use xsl:apply-templates with a select attribute that chooses the particular elements that you want to include, like this:
<xsl:template match="PERIODIC_TABLE">
<xsl:apply-templates select="ATOM"/>
</xsl:template>
<xsl:template match="ATOM">
<xsl:value-of select="."/>
</xsl:template>
The select="." in the second template tells the formatter to take the value of the matched element, ATOM in this example.
The second option is xsl:for-each. The xsl:for-each element processes each element chosen by its select attribute in turn. However, no additional template is required. For example:
<xsl:template match="PERIODIC_TABLE">
<xsl:for-each select="ATOM">
<xsl:value-of select="."/>
</xsl:for-each>
</xsl:template>
The match attribute of the xsl:template element supports a complex syntax that allows you to express exactly which nodes you do and do not want to match. The select attribute of xsl:apply-templates, xsl:value-of, xsl:for-each, xsl:copy-of, and xsl:sort supports an even more powerful superset of this syntax called Xpath that allows you to express exactly which nodes you do
and do not want to select. Various patterns for matching and selecting nodes are discussed below.
In order that the output document be well-formed, the first thing output from an XSL transformation should be the output document's
root element. Consequently, XSLT style sheets generally start with a rule that applies to the root node. To specify the root
node in a rule, you give its match attribute the value "/". For example:
<xsl:template match="/">
<DOCUMENT>
<xsl:apply-templates/>
</DOCUMENT>
</xsl:template>
This rule applies to the root node and only the root node of the input tree. When the root node is read, the tag <DOCUMENT> is output, the children of the root node are processed, then the </DOCUMENT> tag is output. This rule overrides the default rule for the root node. Listing 17-6 shows a style sheet with a single rule
that applies to the root node.
Listing 17-6: An XSLT style sheet with one rule for the root node
<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<html>
<head>
<title>Atomic Number vs. Atomic Weight</title>
</head>
<body>
<table>
Atom data will go here
</table>
</body>
</html>
</xsl:template>
</xsl:stylesheet>
Because this style sheet only provides a rule for the root node, and because that rule's template does not specify any further processing of child nodes, only literal output that's included in the template is inserted in the resulting document. In other words, the result of applying the style sheet in Listing 17-6 to Listing 17-1 (or any other well-formed XML document) is this:
<html>
<head>
<title>Atomic Number vs. Atomic Weight</title>
</head>
<body>
<table>
Atom data will go here
</table>
</body>
</html>
As previously mentioned, the most basic pattern contains a single element name that matches all elements with that name. For
example, this template matches ATOM elements and makes their ATOMIC_NUMBER children bold:
<xsl:template match="ATOM">
<b><xsl:value-of select="ATOMIC_NUMBER"/></b>
</xsl:template>
Listing 17-7 demonstrates a style sheet that expands on Listing 17-6. First, an xsl:apply-templates element is included in the template rule for the root node. This rule uses a select attribute to ensure that only PERIODIC_TABLE elements are processed.
Second, a rule that only applies to PERIODIC_TABLE elements is created using match="PERIODIC_TABLE". This rule sets up the header for the table, and then applies templates to form the body of the table from ATOM elements.
Finally, the ATOM rule specifically selects the ATOM element's NAME, ATOMIC_NUMBER, and ATOMIC_WEIGHT child elements with <xsl:value-of select="NAME"/>, <xsl:value-of select="ATOMIC_NUMBER"/>, and <xsl:value-of select="ATOMIC_WEIGHT"/>. These are wrapped up inside HTML's tr and td elements, so that the end result is a table of atomic numbers matched to atomic weights. Figure 17-4 shows the output of
applying the style sheet in Listing 17-7 to the complete periodic table document displayed in Netscape Navigator.
One thing you may wish to note about this style sheet: The exact order of the NAME, ATOMIC_NUMBER, and ATOMIC_WEIGHT elements in the input document is irrelevant. They appear in the output in the order they were selected; that is, first number,
then weight. Conversely, the individual atoms are sorted in alphabetical order as they appear in the input document. Later,
you'll see how to use an xsl:sort element to change that so you can arrange the atoms in the more conventional atomic number order.
Listing 17-7: Templates applied to specific classes of element with select
<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<html>
<head>
<title>Atomic Number vs. Atomic Weight</title>
</head>
<body>
<xsl:apply-templates select="PERIODIC_TABLE"/>
</body>
</html>
</xsl:template>
<xsl:template match="PERIODIC_TABLE">
<h1>Atomic Number vs. Atomic Weight</h1>
<table>
<th>Element</th>
<th>Atomic Number</th>
<th>Atomic Weight</th>
<xsl:apply-templates select="ATOM"/>
</table>
</xsl:template>
<xsl:template match="ATOM">
<tr>
<td><xsl:value-of select="NAME"/></td>
<td><xsl:value-of select="ATOMIC_NUMBER"/></td>
<td><xsl:value-of select="ATOMIC_WEIGHT"/></td>
</tr>
</xsl:template>
</xsl:stylesheet>

Figure 17-4: A table showing atomic number versus atomic weight in Netscape Navigator
Sometimes you want a single template to apply to more than one element. You can indicate that a template matches all elements
by using the asterisk wildcard (*) in place of an element name in the match attribute. For example this template says that all elements should be wrapped in a P element:
<xsl:template match="*">
<P>
<xsl:value-of select="."/>
</P>
</xsl:template>
Of course this is probably more than you want. You’d like to use the template rules already defined for PERIODIC_TABLE and ATOM elements as well as the root node and only use this rule for the other elements. Fortunately you can. In the event that two
rules both match a single node, then by default the more specific one takes precedence. In this case that means that ATOM elements will use the template with match="ATOM" instead of a template that merely has match="*". However, NAME, BOILING_POINT, ATOMIC_NUMBER and other elements that don’t match a more specific template will cause the match="*" template to activate.
You can place a namespace prefix in front of the asterisk to indicate that only elements in a particular namespace should
be matched. For example this template matches all SVG elements, presuming that the prefix svg is mapped to the normal SVG URI http://www.w3.org/2000/svg in the style sheet.
<xsl:template match="svg:*">
<DIV>
<xsl:value-of select="."/>
</DIV>
</xsl:template>
Of course in Listing 17-1, there aren't any elements from this namespace, so this template wouldn’t produce any output. However, it might when applied to a different document that did include some SVG.
You're not limited to the children of the current node in match attributes. You can use the / symbol to match specified hierarchies of elements. Used alone, the / symbol refers to the root node. However, you can use it between two names to indicate that the second is the child of the
first. For example, ATOM/NAME refers to NAME elements that are children of ATOM elements.
In xsl:template elements, this enables you to match only some of the elements of a given kind. For example, this template rule marks SYMBOL elements that are children of ATOM elements strong. It does nothing to SYMBOL elements that are not direct children of ATOM elements.
<xsl:template match="ATOM/SYMBOL">
<strong><xsl:value-of select="."/></strong>
</xsl:template>
Caution
Remember that this rule selects SYMBOL elements that are children of ATOM elements, not ATOM elements that have SYMBOL children. In other words, the . in <xsl:value-of select="."/> refers to the SYMBOL and not to the ATOM.
You can specify deeper matches by stringing patterns together. For example, PERIODIC_TABLE/ATOM/NAME selects NAME elements whose parent is an ATOM element whose parent is a PERIODIC_TABLE element.
You can also use the * wild card to substitute for an arbitrary element name in a hierarchy. For example, this template rule applies to all SYMBOL elements that are grandchildren of a PERIODIC_TABLE element.
<xsl:template match="PERIODIC_TABLE/*/SYMBOL">
<strong><xsl:value-of select="."/></strong>
</xsl:template>
Finally, as you saw above, a / by itself selects the root node of the document. For example, this rule applies to all PERIODIC_TABLE elements that are root elements of the document:
<xsl:template match="/PERIODIC_TABLE">
<html><xsl:apply-templates/></html>
</xsl:template>
While / refers to the root node, /* refers to the root element, whatever it is. For example, this template doesn't care whether the root element is PERIODIC_TABLE, DOCUMENT, or SCHENECTADY. It produces the same output in all cases.
<xsl:template match="/*">
<html>
<head>
<title>Atomic Number vs. Atomic Weight</title>
</head>
<body>
<xsl:apply-templates/>
</body>
</html>
</xsl:template>
Sometimes, especially with an uneven hierarchy, you may find it easier to bypass intermediate nodes and simply select all
the elements of a given type, whether they're immediate children, grandchildren, great-grandchildren, or what have you. The
double slash, //, refers to a descendant element at an arbitrary level. For example, this template rule applies to all NAME descendants of PERIODIC_TABLE, no matter how deep:
<xsl:template match="PERIODIC_TABLE//NAME">
<i><xsl:value-of select="."/></i>
</xsl:template>
The periodic table example is fairly shallow, but this trick becomes more important in deeper hierarchies, especially when
an element can contain other elements of its type (for example, an ATOM contains an ATOM).
The // operator at the beginning of a pattern selects any descendant of the root node. For example, this template rule processes
all ATOMIC_NUMBER elements while completely ignoring their location:
<xsl:template match="//ATOMIC_NUMBER">
<i><xsl:value-of select="."/></i>
</xsl:template>
You may want to apply a particular style to a particular single element without changing all other elements of that type.
The simplest way to do that in XSLT is to attach a style to the element's ID type attribute. This is done with the id() selector, which contains the ID value in single quotes. For example, this rule makes the element with the ID e47 bold:
<xsl:template match="id('e47')">
<b><xsl:value-of select="."/></b>
</xsl:template>
This assumes, of course, that the elements that you want to select in this fashion have an attribute declared as type ID in the source document's DTD. This may not be the case, however. For one thing, many documents do not have DTDs. They're
merely well-formed, not valid. And even if they have a DTD, there's no guarantee that any element has an ID type attribute.
Cross-Reference
ID-type attributes are not simply attributes with the name ID. ID type attributes are discussed in Chapter 11.
As you saw in Chapter 5, the @ sign matches against attributes and selects nodes according to attribute names. Simply prefix the name of the attribute that
you want to select with the @ sign. For example, this template rule matches UNITS attributes, and wraps them in an I element.
<xsl:template match="@UNITS">
<I><xsl:value-of select="."/></I>
</xsl:template>
However, merely adding this rule to the style sheet will not automatically produce italicized units in the output because
attributes are not children of the elements that contain them. Therefore by default when an XSLT processor is walking the
tree it does not see attribute nodes. You have to explicitly process them using xsl:apply-templates with an appropriate select attribute. Listing 17-8 demonstrates with a style sheet that outputs a table of atomic numbers versus melting points. Not
only is the value of the MELTING_POINT element written out, so is the value of its UNITS attribute. This is selected by <xsl:apply-templates select="@UNITS"/> in the template rule for MELTING_POINT elements.
Listing 17-8: An XSLT style sheet that selects the UNITS attribute with @
<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/PERIODIC_TABLE">
<html>
<body>
<h1>Atomic Number vs. Melting Point</h1>
<table>
<th>Element</th>
<th>Atomic Number</th>
<th>Melting Point</th>
<xsl:apply-templates/>
</table>
</body>
</html>
</xsl:template>
<xsl:template match="ATOM">
<tr>
<td><xsl:value-of select="NAME"/></td>
<td><xsl:value-of select="ATOMIC_NUMBER"/></td>
<td><xsl:apply-templates select="MELTING_POINT"/></td>
</tr>
</xsl:template>
<xsl:template match="MELTING_POINT">
<xsl:value-of select="."/>
<xsl:apply-templates select="@UNITS"/>
</xsl:template>
<xsl:template match="@UNITS">
<I><xsl:value-of select="."/></I>
</xsl:template>
</xsl:stylesheet>
Recall that the value of an attribute node is simply the normalized string value of the attribute. Once you apply the style
sheet in Listing 17-8, ATOM elements come out formatted like this:
<tr>
<td>Hydrogen</td><td>1</td><td>13.81<I>Kelvin</I></td>
</tr>
<tr>
<td>Helium</td><td>2</td><td>0.95<I>Kelvin</I></td>
</tr>
You can combine attributes with elements using the various hierarchy operators. For example, the pattern BOILING_POINT/@UNITS refers to the UNITS attribute of a BOILING_POINT element. ATOM/*/@UNITS matches any UNITS attribute of a child element of an ATOM element. This is especially helpful when matching against attributes in template rules. You must remember that what's being
matched is the attribute node, not the element that contains it. It's a very common mistake to implicitly confuse the attribute
node with the element node that contains it. For example, consider this rule, which attempts to apply templates to all child
elements that have UNITS attributes:
<xsl:template match="ATOM">
<xsl:apply-templates select="@UNITS"/>
</xsl:template>
What it actually does is apply templates to the nonexistent UNITS attributes of ATOM elements.
You can also use the @* wild card to match all attributes of an element, for example BOILING_POINT/@* to match all attributes of BOILING_POINT elements. You can also add a namespace prefix after the @ to match all attributes in a declared namespace. For instance, @xlink:* matches all the XLink attributes, such as xlink:show, xlink:type, and xlink:href, assuming the xlink prefix is mapped to the http://www.w3.org/1999/xlink XLink namespace URI.
Most of the time you should simply ignore comments in XML documents. Making comments an essential part of a document is a very bad idea. Nonetheless, XSLT does provide a means to match a comment if you absolutely have to.
To match a comment, use the comment() pattern. Although this pattern has function-like parentheses, it never actually takes any arguments. For example, this template
rule italicizes all comments:
<xsl:template match="comment()">
<i><xsl:value-of select="."/></i>
</xsl:template>
To distinguish between different comments, you have to look at the comments' parent and ancestors. For example, recall that
a DENSITY element looks like this:
<DENSITY UNITS="grams/cubic centimeter">
<!-- At 300K, 1 atm -->
0.0000899
</DENSITY>
You can use the hierarchy operators to select particular comments. For example, this rule only matches comments that occur
inside DENSITY elements:
<xsl:template match="DENSITY/comment()">
<i><xsl:value-of select="."/></i>
</xsl:template>
The only reason Listing 17-1 uses a comment to specify conditions instead of an attribute or element is precisely for this
example. In practice, you should never put important information in comments. The real reason XSLT allows you to select comments
is so that a style sheet can transform from one XML application to another while leaving the comments intact. Any other use
indicates a poorly designed original document. The following rule matches all comments, and copies them back out again using
the xsl:comment element.
<xsl:template match="comment()">
<xsl:comment><xsl:value-of select="."/></xsl:comment>
</xsl:template>
When it comes to writing structured, intelligible, maintainable XML, processing instructions aren't much better than comments. However, there are occasional genuine needs for them, including attaching style sheets to documents.
The processing-instruction() function matches processing instructions. The argument to processing-instruction() is a quoted string giving the target of the processing instruction to select. If you do not include an argument, the first
processing instruction child of the current node is matched. For example, this rule matches the processing instruction children
of the root node (most likely the xml-stylesheet processing instruction). The xsl:processing-instruction element inserts a processing instruction with the specified name and value in the output document.
<xsl:template match="/processing-instruction()">
<xsl:processing-instruction name="xml-stylesheet">
type="text/xml" value="auto.xsl"
</xsl:processing-instruction>
</xsl:template>
This rule also matches the xml-stylesheet processing instruction, but by its name:
<xsl:template
match="processing-instruction('xml-stylesheet')">
<xsl:processing-instruction name="xml-stylesheet">
<xsl:value-of select="."/>
</xsl:processing-instruction>
</xsl:template>
In fact, one of the primary reasons for distinguishing between the root element and the root node is so that processing instructions
from the prolog can be read and processed. Although the xml-stylesheet processing instruction uses a name = value syntax, XSL does not consider these to be attributes because processing instructions
are not elements. The value of a processing instruction is simply everything between the white space following its name and
the closing ?>.
Text nodes are generally ignored as nodes, although their values are included as part of the value of a selected element.
However, the text() operator does enable you to specifically select the text child of an element. Despite the parentheses, this operator takes
no arguments. For example, this rule emboldens all text:
<xsl:template match="text()">
<b><xsl:value-of select="."/></b>
</xsl:template>
The main reason this operator exists is for the default rules. XSLT processors must provide the following default rule whether the author specifies it or not:
<xsl:template match="text()">
<xsl:value-of select="."/>
</xsl:template>
This means that whenever a template is applied to a text node, the text of the node is output. If you do not want the default behavior, you can override it. For example, including the following empty template rule in your style sheet will prevent text nodes from being output unless specifically matched by another rule.
<xsl:template match="text()">
</xsl:template>
The vertical bar (|)allows a template rule to match multiple patterns. If a node matches one pattern or the other, it will activate the template.
For example, this template rule matches both ATOMIC_NUMBER and ATOMIC_WEIGHT elements:
<xsl:template match="ATOMIC_NUMBER|ATOMIC_WEIGHT">
<B><xsl:apply-templates/></B>
</xsl:template>
You can include white space around the | if that makes the code clearer. For example,
<xsl:template match="ATOMIC_NUMBER | ATOMIC_WEIGHT">
<B><xsl:apply-templates/></B>
</xsl:template>
You can also use more than two patterns in sequence. For example, this template rule applies to ATOMIC_NUMBER, ATOMIC_WEIGHT, and SYMBOL elements (that is, it matches ATOMIC_NUMBER, ATOMIC_WEIGHT and SYMBOL elements):
<xsl:template match="ATOMIC_NUMBER | ATOMIC_WEIGHT | SYMBOL">
<B><xsl:apply-templates/></B>
</xsl:template>
The / operator is evaluated before the | operator. Thus, the following template rule matches an ATOMIC_NUMBER child of an ATOM, or an ATOMIC_WEIGHT of unspecified parentage, not an ATOMIC_NUMBER child of an ATOM or an ATOMIC_WEIGHT child of an ATOM.
<xsl:template match="ATOM/ATOMIC_NUMBER|ATOMIC_WEIGHT">
<B><xsl:apply-templates/></B>
</xsl:template>
So far, I've merely tested for the presence of various nodes. However, you can test for more details about the nodes that
match a pattern using []. You can perform many different tests including:
For example, seaborgium, element 106, has only been created in microscopic quantities. Even its most long-lived isotope has
a half-life of only 30 seconds. With such a hard-to-create, short-lived element, it's virtually impossible to measure the
density, melting point, and other bulk properties. Consequently, the periodic table document omits the elements describing
the bulk properties of seaborgium and similar atoms because the data simply doesn’t exist. If you want to create a table of
atomic number versus melting point, you should omit those elements with unknown melting points. To do this, you can provide
one template for ATOM elements that have MELTING_POINT children and another one for elements that don't, like this:
<!-- Include nothing for arbitrary atoms -->
<xsl:template match="ATOM" />
<!-- Include a table row for atoms that do have
melting points. This rule will override the
previous one for those atoms that do have
melting points. -->
<xsl:template match="ATOM[MELTING_POINT]">
<tr>
<td><xsl:value-of select="NAME"/></td>
<td><xsl:value-of select="MELTING_POINT"/></td>
</tr>
</xsl:template>
Note here, that it is the ATOM element being matched, not the MELTING_POINT element as in the case of ATOM/MELTING_POINT.
The test brackets can contain more than simply a child-element name. In fact, they can contain any XPath expression. (XPath
expressions are a superset of match patterns that are discussed in the next section.) If the specified element has a child
matching that expression, it is considered to match the total pattern. For example, this template rule matches ATOM elements with NAME or SYMBOL children.
<xsl:template match="ATOM[NAME | SYMBOL]">
</xsl:template>
This template rule matches ATOM elements with a DENSITY child element that has a UNITS attribute:
<xsl:template match="ATOM[DENSITY/@UNITS]">
</xsl:template>
To revisit an earlier example, to correctly find all child elements that have UNITS attributes, use * to find all elements and [@UNITS] to winnow those down to the ones with UNITS attributes, like this:
<xsl:template match="ATOM">
<xsl:apply-templates select="*[@UNITS]"/>
</xsl:template>
One type of pattern testing that proves especially useful is string equality. An equals sign (=) can test whether the value of a node identically matches a given string. For example, this template finds the ATOM element that contains an ATOMIC_NUMBER element whose content is the string 10 (Neon).
<xsl:template match="ATOM[ATOMIC_NUMBER='10']">
This is Neon!
</xsl:template>
Testing against element content may seem extremely tricky because of the need to get the value exactly right, including white
space. You may find it easier to test against attribute values since those are less likely to contain insignificant white
space. For example, the style sheet in Listing 17-9 applies templates only to those ATOM elements whose STATE attribute value is the three letters GAS.
Listing 17-9: An XSLT style sheet that selects only those ATOM elements whose STATE attribute has the value GAS
<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="PERIODIC_TABLE">
<html>
<head><title>Gases</title></head>
<body>
<xsl:apply-templates/>
</body>
</html>
</xsl:template>
<xsl:template match="ATOM"/>
<xsl:template match="ATOM[@STATE='GAS']">
<P><xsl:value-of select="."/></P>
</xsl:template>
</xsl:stylesheet>
You can use other XPath expressions for more complex matches. For example, you can select all elements whose names begin with "A" or all elements with an atomic number less than 100.
The select attribute is used in xsl:apply-templates, xsl:value-of, xsl:for-each, xsl:copy-of, xsl:variable, xsl:param, and xsl:sort to specify exactly which nodes are operated on. The value of this attribute is an expression written in the XPath language. The XPath language provides a means of identifying a particular element, group of elements,
text fragment, or other part of an XML document. The XPath syntax is used both for XSLT and XPointer.
Cross-reference
XPointers are discussed in Chapter 20. XPath is discussed further in that chapter as well.
Expressions are a superset of the match patterns discussed in the last section. That is, all match patterns are expressions, but not all expressions are match patterns. Recall that match patterns enable you to match nodes by element name, child elements, descendants, and attributes, as well as by making simple tests on these items. XPath expressions allow you to select nodes through all these criteria but also by referring to ancestor nodes, parent nodes, sibling nodes, preceding nodes, and following nodes. Furthermore, expressions aren't limited to producing merely a list of nodes, but can also produce booleans, numbers, and strings.
Expressions are not limited to specifying the children and descendants of the current node. XPath provides a number of axes that you can use to select from different parts of the tree relative to some particular node in the tree called the context node. In XSLT, the context node is normally initialized to the current node that the template matches, though there are ways to change this. Table 17-2 summarizes the axes and their meanings.
Table 17-2: Expression Axes
|
Axis: |
Selects From: |
|
|
The parent of the context node, the parent of the parent of the context node, the parent of the parent of the parent of the context node, and so forth back to the root node |
|
|
The ancestors of the context node and the context node itself |
|
|
The attributes of the context node |
|
|
The immediate children of the context node |
|
|
The children of the context node, the children of the children of the context node, and so forth |
|
|
The context node itself and its descendants |
|
|
All nodes that start after the end of the context node, excluding attribute and namespace nodes |
|
|
All nodes that start after the end of the context node and have the same parent as the context node |
|
|
The namespace of the context node |
|
|
The unique parent node of the context node |
|
|
All nodes that finish before the beginning of the context node, excluding attribute and namespace nodes |
|
|
All nodes that start before the beginning of the context node and have the same parent as the context node |
|
|
The context node |
Choosing an axis limits the expression so that it only selects from the set of nodes indicated in the second column of Table
17-2. The axis is generally followed by a double colon (::) and a node test that further winnows down this node set. For example, a node test may contain the name of the element to
be selected as in the following template rule:
<xsl:template match="ATOM">
<tr>
<td>
<xsl:value-of select="child::NAME"/>
</td>
<td>
<xsl:value-of select="child::ATOMIC_NUMBER"/>
</td>
<td>
<xsl:value-of select="child::ATOMIC_WEIGHT"/>
</td>
</tr>
</xsl:template>
The template rule matches ATOM elements. When an ATOM element is matched, that element becomes the context node. A NAME element, an ATOMIC_NUMBER element, and an ATOMIC_WEIGHT element are all selected from the children of that matched ATOM element and output as table cells. (If there's one more than one of these desired elements — for example, three NAME elements — then all are selected but only the value of the first one is taken.)
The child axis doesn't let you do anything that you can't do with element names alone. In fact select="ATOMIC_WEIGHT" is just an abbreviated form of select="child::ATOMIC_WEIGHT". However, the other axes are a little more interesting.
Referring to the parent element is illegal in match patterns, but not in expressions. To refer to the parent, you use the
parent axis. For example, this template matches BOILING_POINT elements but outputs the value of the parent ATOM element:
<xsl:template match="BOILING_POINT">
<P><xsl:value-of select="parent::ATOM"/></P>
</xsl:template>
Some radioactive atoms such as polonium have half-lives so short that bulk properties such as the boiling point and melting
point can't be measured. Therefore, not all ATOM elements necessarily have BOILING_POINT child elements. The above rule enables you to write a template that only outputs those elements that actually have boiling
points. Expanding on this example, Listing 17-10 matches the MELTING_POINT elements but actually outputs the parent ATOM element using parent::ATOM.
Listing 17-10: A style sheet that outputs only those elements with known melting points
<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<html>
<body>
<xsl:apply-templates select="PERIODIC_TABLE"/>
</body>
</html>
</xsl:template>
<xsl:template match="PERIODIC_TABLE">
<h1>Elements with known Melting Points</h1>
<xsl:apply-templates select=".//MELTING_POINT"/>
</xsl:template>
<xsl:template match="MELTING_POINT">
<p>
<xsl:value-of select="parent::ATOM"/>
</p>
</xsl:template>
</xsl:stylesheet>
Once in awhile, you may need to select the nearest ancestor of an element with a given type. The ancestor axis does this. For example, this rule inserts the value of the nearest PERIODIC_TABLE element that contains the matched SYMBOL element.
<xsl:template match="SYMBOL">
<xsl:value-of select="ancestor::PERIODIC_TABLE"/>
</xsl:template>
The ancestor-or-self axis behaves like the ancestor axis except that if the context node passes the node test, then it will be returned as well. For example, this rule matches
all elements. If the matched element is a PERIODIC_TABLE, then that very PERIODIC_TABLE is selected in xsl:value-of.
<xsl:template match="*">
<xsl:value-of select="ancestor-or-self::PERIODIC_TABLE"/>
</xsl:template>
Instead of the name of a node, the axis may be followed by one of these four node-type functions:
comment()text()processing-instruction()node()The comment() function selects a comment node. The text() function selects a text node. The processing-instruction() function selects a processing instruction node, and the node() function selects any type of node. (The * wild card only selects element nodes.) The processing-instruction() node type can also contain an optional argument specifying the name of the processing instruction to select.
You can use the / and // operators to string expressions together. For example, Listing 17-11 prints a table of element names, atomic numbers, and
melting points for only those elements that have melting points. It does this by selecting the parent of the MELTING_POINT element, then finding that parent's NAME and ATOMIC_NUMBER children with select="parent::*/child::NAME)".
Listing 17-11: A table of melting point versus atomic number
<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/PERIODIC_TABLE">
<html>
<body>
<h1>Atomic Number vs. Melting Point</h1>
<table>
<th>Element</th>
<th>Atomic Number</th>
<th>Melting Point</th>
<xsl:apply-templates select="child::ATOM"/>
</table>
</body>
</html>
</xsl:template>
<xsl:template match="ATOM">
<xsl:apply-templates
select="child::MELTING_POINT"/>
</xsl:template>
<xsl:template match="MELTING_POINT">
<tr>
<td>
<xsl:value-of select="parent::*/child::NAME"/>
</td>
<td>
<xsl:value-of
select="parent::*/child::ATOMIC_NUMBER"/>
</td>
<td>
<xsl:value-of select="self::*"/>
<xsl:value-of select="attribute::UNITS"/>
</td>
</tr>
</xsl:template>
</xsl:stylesheet>
This is not the only way to solve the problem. Another possibility is to use the preceding-sibling and following-sibling axes, or both if the relative location (preceding or following) is uncertain. The necessary template rule for the MELTING_POINT element looks like this:
<xsl:template match="MELTING_POINT">
<tr>
<td>
<xsl:value-of
select="preceding-sibling::NAME
| following-sibling::NAME"/>
</td>
<td>
<xsl:value-of
select="preceding-sibling::ATOMIC_NUMBER
| following-sibling::ATOMIC_NUMBER"/>
</td>
<td>
<xsl:value-of select="self::*"/>
<xsl:value-of select="attribute::UNITS"/>
</td>
</tr>
</xsl:template>
The various axes in Table 17-2 are a bit too wordy for comfortable typing. XPath also defines an abbreviated syntax that can substitute for the most common of these axes and is more used in practice. Table 17-3 shows the full and abbreviated equivalents.
Table 17-3: Abbreviated Syntax for XPath Expressions
|
Abbreviation: |
Full: |
|
. |
self::node() |
|
.. |
parent::node() |
|
name |
child::name |
|
@name |
attribute::name |
|
// |
/descendant-or-self::node()/ |
Listing 17-12 demonstrates by rewriting Listing 17-11 using the abbreviated syntax. The output produced by the two style sheets is exactly the same, however.
Listing 17-12: A table of melting point versus atomic number using the abbreviated syntax
<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/PERIODIC_TABLE">
<html>
<body>
<h1>Atomic Number vs. Melting Point</h1>
<table>
<th>Element</th>
<th>Atomic Number</th>
<th>Melting Point</th>
<xsl:apply-templates select="ATOM"/>
</table>
</body>
</html>
</xsl:template>
<xsl:template match="ATOM">
<xsl:apply-templates
select="MELTING_POINT"/>
</xsl:template>
<xsl:template match="MELTING_POINT">
<tr>
<td>
<xsl:value-of
select="../NAME"/>
</td>
<td>
<xsl:value-of
select="../ATOMIC_NUMBER"/>
</td>
<td>
<xsl:value-of select="."/>
<xsl:value-of select="@UNITS"/>
</td>
</tr>
</xsl:template>
</xsl:stylesheet>
Match patterns can only use the abbreviated syntax and the child and attribute axes. The full syntax using the axes of Table 17-2 is restricted to expressions.
Every expression evaluates to a single value. For example, the expression 3 + 2 evaluates to the value 5. The expressions used so far have all evaluated to node sets. However, there are five types of expressions in XSLT:
A node set is an unordered group of nodes from the input document. The axes in Table 17-2 all return a node set containing the nodes they match. Which nodes are in the node set depends on the context node, the node test, and the axis.
For example, when the context node is the PERIODIC_TABLE element of Listing 17-1, the XPath expression select="child::ATOM" returns a node set that contains both ATOM elements in that document. The XPath expression select="child::ATOM/child::NAME" returns a node set containing the two element nodes <NAME>Hydrogen</NAME> and <NAME>Helium</NAME> when the context node is the PERIODIC_TABLE element of Listing 17-1.
The context node is a member of the context node list. The context node list is that group of elements that all match the same rule at the same time, generally as a result of
one xsl:apply-templates or xsl:for-each call. For instance, when Listing 17-12 is applied to Listing 17-1, the ATOM template is invoked twice, first for the hydrogen atom, then for the helium atom. The first time it's invoked, the context
node is the hydrogen ATOM element. The second time it's invoked, the context node is the helium ATOM element. However, both times the context node list is the set containing both the helium and hydrogen ATOM elements.
Table 17-4 lists a number of functions that operate on node sets, either as arguments or as the context node.
Table 17-4: Functions That Operate on or Return Node Sets
|
Function: |
Return Type: |
Returns: |
|
|
number |
The position of the context node in the context node list; the first node in the list has position 1 |
|
|
number |
The number of nodes in the context node list; this is the same as the position of the last node in the list |
|
|
number |
The number of nodes in |
|
|
node set |
A node set containing all the elements anywhere in the same document that have an ID named in the argument list; the empty set if no element has the specified ID. |
|
|
node set |
A node set containing all nodes in this document that have a key with the specified value. Keys are set with the top-level
|
|
|
node set |
A node set in the document referred to by the URI; the nodes are chosen from the named anchor or XPointer used by the URI. If there is no named anchor or XPointer, then the root element of the named document is the node set. Relative URIs are relative to the base URI given in the second argument. If the second argument is omitted, then relative URIs are relative to the URI of the style sheet (not the source document!). |
|
|
string |
The local name (everything after the namespace prefix) of the first node in the |
|
|
string |
The URI of the namespace of the first node in the node set; can be used without any arguments to get the URI of the namespace of the context node; returns an empty string if the node is not in a namespace. |
|
|
string |
The qualified name (both prefix and local part) of the first node in the |
|
|
string |
A unique identifier for the first node in the argument |
If an argument of the wrong type is passed to one of these functions, then XSLT will attempt to convert that argument to the correct type; for instance, by converting the number 12 to the string "12". However, no arguments may be converted to node sets.
The position() function can be used to determine an element's position within a node set. Listing 17-13 is a style sheet that prefixes the
name of each atom's name with its position in the document using <xsl:value-of select="position()"/>.
Listing 17-13: A style sheet that numbers the atoms in the order they appear in the document
<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:tem