Chapter 16. XPath

Table of Contents

Queries
The XPath Data Model
Location Paths
Axes
Node tests
Predicates
Compound Location Paths
Absolute Location Paths
Abbreviated Location paths
Combining location paths
Expressions
Literals
Operators
Functions
XPath Engines
XPath with Saxon
XPath with Xalan
DOM Level 3 XPath
Namespace Bindings
Snapshots
Compiled Expressions
Jaxen
Summary

Much of the code in this book has involved navigating the tree structure of an XML document to find particular nodes. For example, the XML-RPC servlet in Chapter 10 read a client request looking for int elements. Such code can become quite involved and fragile if you aren’t very careful. As the code walks down the tree hierarchy, loading one child after the other, a single misplaced or misnamed element may cause the program to fail. If an element isn’t where it’s expected to be, the chain of method calls that gives directions to the desired elements will be broken. What’s needed is a way to specify which nodes a program needs without explicitly specifying how the program navigates to those nodes.

XPath is a fourth generation declarative language for locating nodes in XML documents. An XPath location path says which nodes from the document you want. It says nothing about what algorithm is used to find these nodes. You simply pass an XPath statement to a method, and the XPath engine is responsible for figuring out how to find all the nodes satisfying that expression. This is much more robust than writing the detailed search and navigation code yourself using DOM, SAX, or JDOM. XPath searches often succeed even when the document format is not quite what you expected. For example, a comment in the middle of a paragraph of text may break DOM code that expects to see contiguous text. XPath wouldn’t be phased by this. Many XPath expressions are resistant even to much more significant alterations such as changing the names or namespaces of ancestor elements, reordering the children of an element, or even adding or subtracting entire levels from the tree hierarchy.

In the large, using XPath in a Java program is like using SQL in a Java program. To extract information from a database, you write a SQL statement indicating what information you want and you ask JDBC to fetch it for you. You neither know nor care how JDBC communicates with the database. Similarly with XML, you write an XPath expression indicating what information you want from an XML document and ask the XPath engine to fetch it, without concerning yourself with the exact algorithms used to search the XML document.

Queries

XPath can be thought of as a query language like SQL. However, rather than extracting information from a database, it extracts information from an XML document. An example should help make this more concrete. Consider the simple weather report document in Example 16.1.

Example 16.1. Weather data in XML

<?xml version="1.0" encoding="ISO-8859-1"?>
<weather time="2002-06-06T15:35:00-05:00">
  <report latitude="41.2° N" longitude="71.6° W">
    <locality>Block Island</locality>
    <temperature units="°C">16</temperature>
    <humidity>88%</humidity>
    <dewpoint units="°C">14</dewpoint>
    <wind>
      <direction>NE</direction>
      <speed units="km/h">16.1</speed>
      <gust units="km/h">31</gust>
    </wind>
    <pressure units="hPa">1014</pressure>
    <condition>overcast</condition>
    <visibility>13 km</visibility>
  </report>
  <report latitude="34.1° N" longitude="118.4° W">
    <locality>Santa Monica</locality>
    <temperature units="°C">19</temperature>
    <humidity>79%</humidity>
    <dewpoint units="°C">16</dewpoint>
    <wind>
      <direction>WSW</direction>
      <speed units="km/h">14.5</speed>
    </wind>
    <pressure units="hPa">1010</pressure>
    <condition>hazy</condition>
    <visibility>5 km</visibility>
  </report>  
</weather>

Here are some XPath expressions that identify particular parts of this document:

  • /weather/report is an XPath expression that selects the two report elements.

  • /weather/report[1] is an XPath expression that selects the first report element.

  • /weather/report/temperature is an XPath expression that selects the two temperature elements.

  • /weather/report[locality="Santa Monica"] is an XPath expression that selects the second report element.

  • //report[locality="Block Island"]/attribute::longitude is an XPath expression that selects the longitude attribute of the first report element.

  • /child::weather/child::report/child::wind/child::* is an XPath expression that selects all the direction, speed, and gust elements.

  • 9 * number(/weather/report[locality="Block Island"]/temperature) div 5 + 32 is an XPath expression that returns the temperature on Block Island in degrees Fahrenheit.

  • /descendant::* is an XPath expression that selects all the elements in the document.

Like SQL, XPath expressions are used in many different contexts including:

  • Dedicated query tools like Alex Chaffee’s XPath Explorer. Figure 16.1 shows this tool evaluating the expression /weather/report/temperature against Example 16.1.

  • Native XML databases like the Apache XML Project’s XIndice and Software AG’s Tamino.

  • As a component of other, broader languages like XSLT and XQuery.

  • And last and certainly not least, as a search component for your own Java programs that read XML documents.

Figure 16.1. XPath Explorer


Copyright 2001, 2002 Elliotte Rusty Haroldelharo@metalab.unc.eduLast Modified June 02, 2002
Up To Cafe con Leche