Extending XSLT with Java

Extending XSLT with Java
Prev	Chapter 17. XSLT	Next

TrAX lets you integrate XSLT code with Java programs. Most XSLT processors written in Java also let you go the other way, integrating Java code with XSLT stylesheets. The most common reason to do this is to provide access to operating system functionality XSLT doesn’t offer such as querying a database, listing the files in a directory, or asking the user for more information with a dialog box. Java can also be used when you simply find it easier to implement some complex algorithm in imperative Java rather than functional XSLT. For example, although you can do complicated string searching and replacing in XSLT, I guarantee you it will be about a thousand times easier in Java, especially with a good regular expression library. And finally, even though a function could be implemented in pure XSLT relatively easily, you may choose to write it in Java anyway purely for performance reasons. This is especially true for mathematical functions like factorial and fibonacci. XSLT optimizers are not nearly as mature or as reliable as Java optimizers, and those that do exist mostly focus on optimizing XPath search and evaluation on node-sets rather than mathematical operations on numbers.

XSLT defines two mechanisms for integrating Java code into stylesheets, extension functions and extension elements. These are invoked exactly like built-in functions and elements such as document() and xsl:template. However, rather than being provided by the processor, they’re written in Java. Furthermore, they have names in some non-XSLT namespace. The exact way such functions and elements are linked with the processor varies from processor to processor though.

Regardless of which XSLT processor you’re using, there are two basic parts to writing and using extension functions and elements:

Binding the extensions to the stylesheet. This is done via namespaces, class names, and the Java class path.
Mapping the five XSLT types (number, boolean, string, node-set, and result tree fragment) to Java types and vice versa.

Extension Functions

As an example, I’m going to write a simple extension function that calculates Fibonacci numbers. This can be used as a faster alternative to the earlier recursive template. Example 17.19 contains this function. The entire class is in the com.macfaq.math package. When writing extension functions and elements, you really have to use proper Java package naming and set up your class path appropriately.

Example 17.19. A Java class that calculates Fibonacci numbers

package com.macfaq.math;

import java.math.BigInteger;

public class FibonacciNumber {

  public static BigInteger calculate(int n) {
  
    if (n <= 0) {
      throw new IllegalArgumentException(
       "Fibonacci numbers are only defined for positive integers"
      );
    }
    BigInteger low  = BigInteger.ONE;
    BigInteger high = BigInteger.ONE;
    
    for (int i = 3; i <= n; i++) {
      BigInteger temp = high;
      high = high.add(low);
      low = temp;
    }
    
    return high;
  
  }

}

Notice there’s nothing about XSLT in this example. This is just like any other Java class. On the Java side, all you need to do to make it accessible to the XSLT processor is compile it and install the .class file in the proper place in the processor’s class path.

If the extension function throws an exception, as calculate() might if it’s passed a negative number as an argument, then the XSLT processing will halt. XSLT has no way to catch and respond to exceptions thrown by extension functions. Consequently, if you want to handle them, you’ll need to handle them in the Java code. After catching the exception, you’ll want to return something. Possibilities include:

A String containing an error message
A NodeList containing a fault document
An integer error code

Since this may not be the same type you normally return, you’ll probably need to declare that the method returns Object to give you the additional flexibility. For example, this method returns an error message inside a String instead of throwing an exception:

  public static Object calculate(int n) {
  
    if (n <= 0) {
     return
      "Fibonacci numbers are only defined for positive integers";
    }
    BigInteger low  = BigInteger.ONE;
    BigInteger high = BigInteger.ONE;
    
    for (int i = 3; i <= n; i++) {
      BigInteger temp = high;
      high = high.add(low);
      low = temp;
    }
    
    return high;
  
  }

This method returns -1 (an illegal value for a Fibonacci number) instead of throwing an exception:

  public static BigInteger calculate(int n) {
  
    if (n <= 0) return new BigInteger("-1");
    BigInteger low  = BigInteger.ONE;
    BigInteger high = BigInteger.ONE;
    
    for (int i = 3; i <= n; i++) {
      BigInteger temp = high;
      high = high.add(low);
      low = temp;
    }
    
    return high;
  
  }

It would be up to the stylesheet to check for the error code before using the result, and handle such a situation appropriately. In this example, that might require calling the extension function before any output is generated, storing the result in a variable, and deciding whether to output a successful response or a fault document based on the value of that variable. Waiting until the template for the int element is activated would be too late because by that point substantial parts of a successful response document have already been generated.

Now we need a stylesheet that uses this function to calculate Fibonacci numbers instead of the XSLT template. The details at this point are a little processor specific. I will cover the two most popular, Saxon and Xalan. As you’ll see there are quite a few points of similarity between them (though I think Saxon’s approach is the cleaner of the two). Most other processors are likely to use something similar.

Tip

Before spending a lot of time and effort writing your own extension functions, check to see if the EXSLT library already has the extension function you need. EXSLT provides many useful extension functions and elements for working with dates and times, functions, math, strings, regular expressions, sets, and more. This library has been ported to many different processors in many different platforms and languages. I use some of the date functions in the stylesheets for this book.

Extension functions in Saxon

Saxon allows you to bind any Java class to a namespace prefix. The trick is to use the custom URI scheme java followed by a colon and the fully package-qualified name of the class. For example, this attribute binds the namespace prefix fib to the com.macfaq.math.FibonacciNumber class:

xmlns:fib="java:com.macfaq.math.FibonacciNumber"

As long as this mapping is in scope, you can invoke any static function in the com.macfaq.math.FibonacciNumber class by using the prefix fib and the name of the method. For example, the old template for the int element could be replaced by this one:

  <xsl:template match="int"
                xmlns:fib="java:com.macfaq.math.FibonacciNumber">
    <int>
      <xsl:value-of select="fib:calculate(number(.))"/>
    </int>
  </xsl:template>

Here the number() function converts the value of the context node to an XSLT number. Then the processor looks for a static method named calculate() in the Java class mapped to the fib prefix that takes a single argument. It finds one, invokes it, and inserts the return value into the result tree.

XSLT is much more weakly typed than Java, and this can be useful when writing extension functions. Saxon will only invoke methods that have the right name and the right number of arguments. However, it will often convert the types of arguments and return values as necessary to make a function fit. In this case, the calculate() method expects to receive an int. However, an XSLT number is really more like a Java double. In this case, since Saxon can’t find a matching method that takes a double it truncates the fractional part of the double to get an int and invokes the method that takes an int. This is a conversion that Java itself would not do without an explicit cast.

Going in the opposite direction, the calculate() method returns a BigInteger, which is not equivalent to any of XSLT’s types. Thus Saxon converts it to a string using its toString() before inserting it into the result tree. Other more recognizable return types may be converted differently. For example, void is converted to an empty node-set and primitive number types like int and double are converted to XSLT numbers as are type-wrapper classes like Integer and Double. A DOM NodeList is converted to an XPath node-set. However, the nodes in the list must all be created by Saxon’s own DOM implementation. You can’t use third party DOM implementations like Xerces or GNU JAXP in a Saxon extension function.

Tip

Namespace mappings for extension functions and elements are normally only relevant in the stylesheet. Nonetheless they often have an annoying habit of popping up in the output document. If you know that an extension element or function prefix will not be used in the output document (and 99% of the time you do know exactly this) you can add an exclude-result-prefixes attribute to the stylesheet root element that contains a list of the namespace prefixes whose declarations should not be copied into the output document. For example,

<xsl:stylesheet version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:fib="java:com.macfaq.math.FibonacciNumber"
  xmlns:saxon="http://icl.com/saxon"
  exclude-result-prefixes="fib saxon">

Instance Methods and Constructors

XSLT is not an object oriented language. Static methods fit much more neatly into its structures than do objects and instance methods. If I’m writing a method just for XSLT, I’ll normally make it static if at all possible. However, Saxon can use instance methods as extension functions too. As before, the fully package qualified class name must be bound to a namespace prefix. The constructor for the class can be called using the special local function name new(). For example, this template retrieves the current time using the Java Date class:

<xsl:template name="currentTime" 
              xmlns:date="java:java.util.Date">
  <xsl:value-of select="date:new()"/>
</xsl:template>

date:new() in XSLT is basically the same thing as new Date() in Java. When the Date constructor is invoked with no arguments, Java initializes the resulting Date object to the current time. You can also pass arguments to constructors, just like you can to static methods.

The object the new() function returns is normally assigned to a variable. You can pass this variable to other extension functions as an argument. To invoke instance methods on that object, pass the variable that points to the object whose instance method you’re invoking as the first argument to the instance method. Then the normal first argument gets pushed over to become the second argument, the second argument becomes the third, and so on. For example, this template uses the GregorianCalendar class to get today’s date. First it uses the static getInstance() method to return a GregorianCalendar object initialized to the current time. Then it passes the appropriate integer constants to the get() instance method to retrieve the month, day, and year. It produces the current date in the form 2002-3-26.

<xsl:template name="today" 
              xmlns:cal="java:java.util.GregorianCalendar">
  <xsl:variable name="rightNow" select="cal:getInstance()" />
  <!-- The Calendar class uses zero-based months; 
       i.e. January is month 0, February is month 1, and 
       so on. We have to add one to get the customary month 
       number. -->
  <xsl:variable name="month" select="cal:get($rightNow, 2) + 1" />
  <xsl:variable name="day" select="cal:get($rightNow, 5)" />
  <xsl:variable name="year" select="cal:get($rightNow, 1)" />
  <xsl:value-of 
   select="$year" />-<xsl:value-of 
   select="$month" />-<xsl:value-of 
   select="$day" />
</xsl:template>

Note

If I were writing this in Java rather than XSLT, the code would look like this:

Calendar rightNow = Calendar.getInstance();
// Months are zero-based; i.e. January is month 0, February is
// month 1, and so on. We have to add one to get the customary
// month number.
String month = rightNow.get(Calendar.MONTH) + 1;
String date  = rightNow.get(Calendar.DATE);
String year  = rightNow.get(Calendar.YEAR);
String result = year + "-" + month + "-" + date;

However, Saxon doesn’t support extension fields so XSLT must use the actual constant key values instead of the named constants.

If you absolutely have to use the value of a field, (e.g. because a method expects an instance of the type-safe enum pattern instead of an int constant), you can always write an extension function whose sole purpose is to return the relevant field.

Extension functions in Xalan

Xalan’s extension function mechanism is a little more complicated and a little more powerful than Saxon’s, but not a great deal more. Xalan offers somewhat greater access to the XSLT context inside extension functions if you need it, and has some additional shortcuts for mapping Java classes to namespace prefixes. Most importantly, it allows extension functions to work with any compliant DOM2 implementation, rather than requiring its own custom DOM.

Xalan uses the custom URI scheme xalan to bind namespace prefixes to classes. To bind a Java class to a namespace prefix in Xalan, you add an attribute of the form xmlns:prefix="xalan://packagename.classname" to the root element of the stylesheet or some other ancestor element. For example, this attribute binds the namespace prefix fib to the com.macfaq.math.FibonacciNumber class:

xmlns:fib="xalan://com.macfaq.math.FibonacciNumber"

As long as this mapping is in scope, you can invoke any static function in the com.macfaq.math.FibonacciNumber class by using the prefix fib and the name of the method. For example, the pure XSLT template for the int element could be replaced by this one:

<xsl:template match="int"
   xmlns:fib="xalan://com.macfaq.math.FibonacciNumber">
  <int>
    <xsl:value-of select="fib:calculate(number(.))"/>
  </int>
</xsl:template>

Xalan also allows you to define a namespace prefix for the entire Java class library by associating it with the URI http://xml.apache.org/xslt/java. The function calls must then use fully qualified class names. For example, this template uses the prefix java to identify extension functions:

<xsl:template match="int"
    xmlns:java="http://xml.apache.org/xslt/java">
  <int>
    <xsl:value-of select=
     "java:com.macfaq.math.FibonacciNumber.calculate(number(.))"
    />
  </int>
</xsl:template>

This form is convenient if your stylesheets use many different classes. It is of course not limited to classes you write yourself. It works equally well for classes from the standard library and third-party libraries. For example, here’s a random template that uses Java’s Math.random() method:

<xsl:template name="random"
              xmlns:java="http://xml.apache.org/xslt/java">
  <xsl:value-of select="java:java.lang.Math.random()" />
</xsl:template>

Constructors and Instance Methods

Xalan can use instance methods as extension functions too. The new() function invokes the constructor for the class and can take whatever arguments the constructor requires. For example, this template retrieves the current time using the Java Date class:

<xsl:template name="currentTime" 
              xmlns:java="http://xml.apache.org/xslt/java">
  <xsl:value-of select="java:java.util.Date.new()"/>
</xsl:template>

If the prefix is bound to a specific class, you can omit the class name. For example,

<xsl:template name="currentTime" 
              xmlns:date="xalan://java.util.Date">
  <xsl:value-of select="date:new()"/>
</xsl:template>

The object the new() function returns can be assigned to an XSLT variable that can then be passed as an argument to other extension functions or used to invoke instance methods on the object. As in Saxon, to invoke an instance method pass the object whose method you’re invoking as the first argument to the method. For example, here’s the Xalan version of the GregorianCalendar template that produces the current date in the form 2002-3-26.

<xsl:template name="today" 
              xmlns:cal="xalan://java.util.GregorianCalendar">
  <xsl:variable name="rightNow" select="cal:getInstance()" />
  <!-- The GregorianCalendar class counts months from zero
       so we have to add one to get the customary number -->
  <xsl:variable name="month" select="cal:get($rightNow, 2) + 1" />
  <xsl:variable name="day" select="cal:get($rightNow, 5) " />
  <xsl:variable name="year" select="cal:get($rightNow, 1)" />
  <xsl:value-of 
   select="$year" />-<xsl:value-of 
   select="$month" />-<xsl:value-of 
   select="$day" />
</xsl:template>

Like Saxon, Xalan also doesn’t let you access fields in a class, so once again it’s necessary to use the actual values instead of the named constants for the arguments to the get() method.

Exceptions thrown by extension functions have the same results in Xalan as in Saxon; that is, the the XSLT processing halts, possibly in the middle of transforming a document. Once again, it’s probably a good idea to design your extension functions so that they handle all probable exceptions internally and always return a sensible result.

Type Conversion

Xalan converts method arguments and return types between Java and XSLT types in a mostly intuitive way. Table 17.1 lists the conversions from XSLT’s five types to Java types in order of preference:

Table 17.1. Xalan Conversions from XSLT to Java

XSLT type	Java types (in decreasing order of preference)
node-set	`org.w3c.dom.traversal.NodeIterator`, `org.w3c.dom.NodeList`, `org.w3c.dom.Node`, `String`, `Object`, `char`, `double`, `float`, `long`, `int`, `short`, `byte`, `boolean`
string	`String`, `Object`, `char`, `double`, `float`, `long`, `int`, `short`, `byte`, `boolean`
boolean	`boolean`, `Boolean`, `Object`, `String`
number	`double`, `Double`, `float`, `long`, `int`, `short`, `char`, `byte`, `boolean`, `String`, `Object`
result tree fragment	`org.w3c.dom.traversal.NodeIterator`, `org.w3c.dom.NodeList`, `org.w3c.dom.Node`, `String`, `Object`, `char`, `double`, `float`, `long`, `int`, `short`, `byte`, `boolean`

Moving in the other direction from Java to XSLT, the conversions are fairly obvious. Table 17.2 summarizes them. Besides the ones listed here, other object types will normally be converted to a string using their toString() method if they’re actually dereferenced somewhere in the stylesheet. However, their original type will be maintained when they’re passed back to another extension function.

Table 17.2. Xalan Conversions from Java to XSLT

Java type	Xalan XSLT type
`org.w3c.dom.traversal.NodeIterator`	node-set
`org.apache.xml.dtm.DTM`	node-set
`org.apache.xml.dtm.DTMAxisIterator`	node-set
`org.apache.xml.dtm.DTMIterator`	node-set
`org.w3c.dom.Node` and its subtypes (`Element`, `Attr`, etc)	node-set
`org.w3c.dom.DocumentFragment`	result tree fragment
`String`	string
`Boolean`	boolean
`Number` and its subclasses (`Double`, `Integer`, etc)	number
`double`	number
`float`	number
`int`	number
`long`	number
`short`	number
`byte`	number
`char`	object
`boolean`	boolean
`null`	empty string
`void`	empty string

Expression Context

There is one thing Xalan extension functions can do that Saxon extension functions can’t. A Xalan extension function can receive the current XSLT context as an argument. This provides information about the context node, the context node position, the context node list, and variable bindings. Admittedly, needing to know this information inside an extension function is rare. Most operations that consider the current context are more easily implemented in XSLT than Java. Nonetheless, if you need to know this for some reason, you can declare that the initial argument to your function has type org.apache.xalan.extensions.ExpressionContext; for example,

public static Node findMaximum(ExpressionContext context);

You do not need to pass an argument of this type explicitly. Xalan will create an ExpressionContext object for you and pass it to the method automatically. Furthermore, Xalan will always pick a method that takes an ExpressionContext over one that does not.

This Xalan-J ExpressionContext interface, shown in Example 17.20, provides methods to get the context and the context node list, convert the context node into either its string or number value (as defined by the XPath string() and number() functions), and to get the XPath object bound to a known variable or parameter.

Example 17.20. The Xalan ExpressionContext interface

package org.apache.xalan.extensions;

public interface ExpressionContext {

  public Node         getContextNode();
  public NodeIterator getContextNodes();
  public double       toNumber(Node n);
  public String       toString(Node n);
  public XObject      getVariableOrParam(
   org.apache.xml.utils.QName qualifiedName)
   throws javax.xml.transform.TransformerException;

}

Extension Elements

An extension element is much like an extension function. However in the stylesheet it appears as an entire element such as <saxon:script/> or <redirect:write /> rather than as a mere function in an XPath expression contained in a select or test attribute. Any value it returns is placed directly in the result tree.

For example, suppose you wanted to define a fibonacci element like this one:

<fib:fibonacci xmlns:fib="java:com.macfaq.math.FibonacciNumber">
  10
</fib:fibonacci>

When processed, this element would be replaced by the specified Fibonacci number.

The first question is how the XSLT processor should recognize this as an extension element. After all, fib:fibonacci looks just like a literal result element that should be copied verbatim. The answer is that the xsl:stylesheet root element (or some other ancestor element) should have an extension-element-prefixes attribute containing a whitespace separated list of namespace prefixes that identify extension elements. For example, this stylesheet uses the saxon and fib prefixes for extension elements:

<xsl:stylesheet version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:saxon="http://icl.com/saxon"
  xmlns:fib="java:com.macfaq.math.FibonacciNumber"
  extension-element-prefixes="saxon fib">
  
  <!- - ... - ->
  
</xsl:stylesheet>

Since you can’t be sure which extension elements are likely to be available across processors, it’s customary to include one or more xsl:fallback elements as children of each extension element. Each such element contains a template that is instantiated if and only if the parent extension element can’t be found. Example 17.21 demonstrates a stylesheet that attempts to use the fib:fibonacci extension element. However, if that element cannot be found then a pure XSLT solution is used instead.

Example 17.21. A stylesheet that uses an extension element

<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:fib="http://namespaces.cafeconleche.org/fibonacci"
  extension-element-prefixes="fib">
  
  <!-- I deleted the validation code from this stylesheet to
       save space, but it would be easy to add back in if
       for production use. -->
  
  <xsl:template match="/methodCall">
    <methodResponse>
      <params>
        <param>
          <value>
            <xsl:apply-templates select="params/param/value" />
          </value>
        </param>
      </params>
    </methodResponse>
  </xsl:template>

  <xsl:template match="value">
    <int>
      <fib:fibonacci>
        <xsl:value-of select="number(.)"/>
        <xsl:fallback>
          <!-- This template will be called only if the 
               fib:fibonacci code can't be loaded. -->
          <xsl:call-template name="calculateFibonacci">
            <xsl:with-param name="index" select="number(.)" />
          </xsl:call-template>
        </xsl:fallback>
      </fib:fibonacci>
    </int>
  </xsl:template>

  <xsl:template name="calculateFibonacci">
    <xsl:param name="index"/>
    <xsl:param name="low"  select="1"/>
    <xsl:param name="high" select="1"/>
    <xsl:choose>
      <xsl:when test="$index &lt;= 1">
        <xsl:value-of select="$low"/>
      </xsl:when>
      <xsl:otherwise>
        <xsl:call-template name="calculateFibonacci">
          <xsl:with-param name="index" select="$index - 1"/>
          <xsl:with-param name="low"   select="$high"/>
          <xsl:with-param name="high"  select="$high + $low"/>
        </xsl:call-template>
      </xsl:otherwise>
    </xsl:choose>
  </xsl:template>
  
</xsl:stylesheet>

Alternately, you can pass the namespace qualified name of the extension element to the element-available() function to figure out whether or not the extension is available. For example,

  <xsl:template match="value">
    <int>
      <xsl:choose>
        <xsl:when test="element-available('fib:fibonacci')">
          <fib:fibonacci>
            <xsl:value-of select="number(.)"/>
          </fib:fibonacci>
        </xsl:when>
        <xsl:otherwise>
          <xsl:call-template name="calculateFibonacci">
            <xsl:with-param name="index" select="number(.)" />
          </xsl:call-template>
        </xsl:otherwise>
      </xsl:choose>
    </int>
  </xsl:template>

From this point on, the exact details of how you code the extension element in Java are quite implementation dependent. You’ll need to consult the documentation for your XSLT processor to learn how to write an extension element and install it. You cannot use preexisting methods and classes as extension elements. You need to custom code the extension element so it fits in with the processor’s own code.

Caution

Writing an extension element is much more complex than writing an extension function. It requires intimate knowledge of and interaction with the XSLT processor. If at all possible, you should probably use an extension function, perhaps one that returns a node-set, instead of an extension element.

Copyright 2001, 2002 Elliotte Rusty Harold	elharo@metalab.unc.edu	Last Modified May 20, 2002
	Up To Cafe con Leche