Expressions

Not all XPath expressions are location paths. In fact, you’ve already seen several that weren’t. The content of the square brackets in a location step predicate is a more generic form of XPath expression. Each XPath 1.0 expression returns one of these four types:

string

A sequence of zero or more Unicode characters. This is not quite the same thing as a Java String which is a sequence of UTF-16 code points. A single Unicode character from outside Unicode’s Basic Multilingual Plane (BMP) occupies two UTF-16 code points. XPath strings that contain characters from outside the BMP will have smaller lengths than the equivalent Java string.

number

An IEEE-754 double. This is the same as Java’s double primitive data type for all intents and purposes.

boolean

Semantically the same as Java’s boolean type. However, XPath does allow 1 and 0 to represent true and false respectively.

node-set

An unordered collection of nodes from an XML document without any duplicates. Since a node-set is a mathematical set, there is no fundamental ordering defined on the set. However, most node-sets have a natural document order that’s derived from the order of the nodes in the set in the input document. This is similar to how the set of integers {1, 4, -76, 23} is unordered. However, the individual elements in the set can be compared to each other and sorted if desired. In practice, most APIs use lists rather than sets to represent node-sets, and these lists are sorted in either document order or reverse document order, depending on how they were created.

Different XPath engines map these four types to different Java classes and primitive data types. For example, Jaxen uses the normal Java classes Boolean, List, Double, and String, whereas jd.xslt uses the custom types XBoolean, XNodeSet, XNumber, and XString; and DOM3 XPath uses a single XPathResult interface that can hold any of the four XPath types. In all cases, there are straightforward methods to convert these to the usual Java primitive types like boolean and double.

The XPath expression syntax includes literal forms for strings and numbers as well as operators and functions for manipulating all four XPath data types.

The primary use-case for XPath literals and operators is predicates. Although you can use these to perform simple arithmetic and string operations with XPath expressions, you’re more likely to do complex work of this sort in the Java code. However, the functions can perform some very useful operations on node-sets that would be much harder to implement in SAX, DOM, or JDOM.

Literals

XPath defines literal forms for strings and numbers. Numbers have more or less the same form as double literals in Java. That is, they look like 72.5, -72.5, .5321, and so forth. XPath only uses floating point arithmetic, so integers like 42, -23, and 0 are also number literals. However, XPath does not recognize scientific notation such as 5.5E-10 or 6.022E23.

XPath string literals are enclosed in single or double quotes. For example, "red" and 'red' are different representations for the same string literal containing the word red.

There are no boolean or node-set literals. However, the true() and false() functions sometimes substitute for the lack of boolean literals.

Operators

XPath provides the following operators for basic floating point arithmetic:

+addition
-subtraction
*multiplication
divdivision
modtaking the remainder

All five behave the same as the equivalent operators in Java. The keywords div and mod are used instead of / and % respectively.

XPath also provides these operators for comparisons and boolean logic:

<less than
>greater than
<=less than or equal to
>=greater than or equal to
=boolean equals (not an assignment statement as in Java)
!=not equal to
orBoolean or
andBoolean and

In an XML context such as an XSLT stylesheet, some of these may need to be escaped with &lt; or &gt;. However, this is normally not necessary when using XPath in Java code.

Additional arithmetic and boolean operations such as rounding and negation are provided by various XPath functions.

Functions

XPath defines a number of useful functions that operate on and return the four fundamental XPath data types. Some of these take variable numbers of arguments. In the list below, optional arguments are suffixed with a question mark. A function that doesn’t have any arguments normally operates on the context node instead. For the most part these functions are weakly typed. You can pass any of the four types in the place of an argument that is declared to be of type boolean, number, or string. XPath will convert it and use it. The exceptions are those functions that are declared to take node-sets as arguments. XPath cannot convert arguments of other types to node-sets.

None of these functions modify their arguments in anyway. An object passed to any of these functions will be the same after the function returns as it was before the function was invoked. However, many of these functions return a new object which is a variant of one of the arguments. This characteristic is necessary to make XSLT (which depends on XPath) a functional language.

Node-set functions

number last()

Returns the number of nodes in the context node list. This is the same as the position of the last node in the list.

number position()

Returns the position of the context node in the context node list. The first node has position 1, not 0.

number count(node-set)

Returns the number of nodes in the argument

node-set id(object)

Returns a node-set containing the single element node with the specified id as determined by an ID-type attribute. If no node has the specified ID, then this function returns an empty node-set. If the argument is a node-set, then it returns a node-set containing all the element nodes whose ID matches the string-value of any of the nodes in the argument node-set.

string local-name(node-set?)

Returns the local name of the first node in the argument node-set, or the local name of the context node if the argument is omitted. It returns an empty string if the relevant node does not have a local name (i.e. it’s a comment, root, or text node.)

string namespace-uri(node-set?)

Returns the namespace name of the first node in the argument node-set, or the namespace name of the context node if the argument is omitted. It returns an empty string if the node is an element or attribute that is not in a namespace. It also returns an empty string if namespace names don’t apply to this node (i.e. it’s a comment, processing instruction, root, or text node.)

string name(node-set?)

Returns the full, prefixed name of the first node in the argument node-set, or the name of the context node if the argument is omitted. It returns the empty string if the relevant node does not have a name (e.g. it’s a comment or text node.)

Boolean functions

boolean boolean(object)

Converts the argument to a boolean in a mostly sensible way. NaN and 0 are false. All other numbers are true. Empty strings are false. All other strings are true. Empty node-sets are false. All other node-sets are true.

boolean not(boolean)

This function turns true into false and false into true.

boolean true()

This function always returns true. It’s necessary because XPath does not have any boolean literals.

boolean false()

This function always returns false. It’s necessary because XPath does not have any boolean literals.

boolean lang(string)

This function returns true if the context node is written in the language specified by the argument. The language of the context node is determined by the currently in-scope xml:lang attribute. If there is no such attribute, this function returns false.

String functions

string string(object?)

This function returns the string-value of the argument. If the argument is a node-set, then it returns the string-value of the first node in the set. If the argument is omitted, it returns the string-value of the context node.

string concat(string, string, string...)

This function returns a string containing the concatenation of all its arguments.

boolean starts-with(string, string)

This function returns true if the first string starts with the second string. Otherwise it returns false.

boolean contains(string, string)

This function returns true if the first string contains the second string. Otherwise it returns false.

string substring-before(string, string)

This returns that part of the first string that precedes the second string. It returns the empty string if the second string is not a substring of the first string. If the second string appears multiple times in the first string, then this returns the portion of the first string before the first appearance of the second string.

string substring-after(string, string)

This returns that part of the first string that follows the second string. It returns the empty string if the second string is not a substring of the first string. If the second string appears multiple times in the first string, then this returns the portion of the first string after the initial appearance of the second string.

string substring(string, number, number?)

This returns the substring of the first argument beginning at the second argument and continuing for the number of characters specified by the third argument (or until the end of the string if the third argument is omitted.)

number string-length(string?)

Returns the number of Unicode characters in the string, or the string-value of the context node if the argument is omitted. This may not be the same as the number returned by the length() method in Java’s String class because XSLT counts characters and Java counts UTF-16 code points.

string normalize-space(string?)

This function strips all leading and trailing white-space from its argument, or the string-value of the context node if the argument is omitted, and condenses all other runs of whitespace to a single space. It’s very useful in XML documents where whitespace is used primarily for formatting.

string translate(string, string, string)

This function replaces all characters in the first string that are found in the second string with the corresponding character from the third string.

Number functions

number number(object?)

This function converts its argument to a number in a reasonable way. Strings like "23" and "42.5" are converted exactly as you’d expect. Other strings are converted to NaN. Node-sets are converted by converting the string-value of the first node in the set. True booleans are converted to 1. False booleans are converted to 0. If the argument is omitted, it converts the string-value of the context node to a number.

number sum(node-set)

Each node in the node-set is converted to a number, as if by the number() function. Those numbers are added together, and the sum is returned.

number floor(number)

Returns the largest integer less than or equal to the argument.

number ceiling(number)

Returns the smallest integer greater than or equal to the argument.

number round(number)

Returns the integer nearest to the argument.

There’s more to XPath than the basics I’ve covered here. In particular I haven’t discussed variables or extension functions, since both of these are normally only important when using XPath as part of XSLT or XQuery, rather than when using raw XPath in combination with Java. However, this should give you the basic knowledge you need to write simple XPath expressions and include those in your programs. Now it’s time to investigate the APIs that enable you to do this.


Copyright 2001, 2002 Elliotte Rusty Haroldelharo@metalab.unc.eduLast Modified June 07, 2002
Up To Cafe con Leche