TrAX

TrAX
Prev	Chapter 17. XSLT	Next

TrAX, the Transformations API for XML, is a Java API for performing XSLT transforms. It is sufficiently parser-independent that it can work with many different XSLT processors including Xalan and SAXON. It is sufficiently model-independent that it can transform to and from XML streams, SAX event sequences, and DOM and JDOM trees.

TrAX is a standard part of JAXP, and is bundled with Java 1.4 and later. Furthermore, most current XSLT processors written in Java support TrAX including Xalan-J 2.x, jd.xslt, LotusXSL, and Saxon. The specific implementation included with Java 1.4.0 is Xalan-J 2.2D10.

Note

Annoyingly, the Xalan-J classes included in Java 1.4 are zipped into the rt.jar archive so it’s hard to replace them with a less-buggy release version of Xalan. It can be done, but you have to put the xalan.jar file in your $JAVA_HOME/lib/endorsed directory rather than in the normal jre/lib/ext directory. The exact location of $JAVA_HOME varies from system to system, but it’s probably something like C:\j2sdk1.4.0 on Windows. None of this is an issue with Java 1.3 and earlier, which don’t bundle these classes. On these systems you just need to install whatever jar files your XSLT engine vendor provides in the usual locations, the same as you would any other third party library.

There are four main classes and interfaces in TrAX that you need to use, all in the javax.xml.transforms package:

Transformer: The class that represents the style sheet. It transforms a Source into a Result.
TransformerFactory: The class that represents the XSLT processor. This is a factory class that reads a stylesheet to produce a new Transformer.
Source: The interface that represents the input XML document to be transformed, whether presented as a DOM tree, an InputStream, or a SAX event sequence.
Result: The interface that represents the XML document produced by the transformation, whether generated as a DOM tree, an OutputStream, or a SAX event sequence.

To transform an input document into an output document follow these steps:

Load the TransformerFactory with the static TransformerFactory.newInstance() factory method.
Form a Source object from the XSLT stylesheet.
Pass this Source object to the factory’s newTransformer() factory method to build a Transformer object.
Build a Source object from the input XML document you wish to transform.
Build a Result object for the target of the transformation.
Pass both the source and the result to the Transformer object’s transform() method.

Steps four through six can be repeated for as many different input documents as you want. You can reuse the same Transformer object repeatedly in series, though you can’t use it in multiple threads in parallel.

For example, suppose you want to use the Fibonacci stylesheet in Example 17.5 to implement a simple XML-RPC server. The request document will arrive on an InputStream named in and be returned on an OutputStream named out. Therefore, we’ll use javax.xml.transform.stream.StreamSource as the Source for the input document and javax.xml.transform.stream.StreamResult as the Result for the output document. The stylesheet itself will also be assumed to live at the relative URL FibonacciXMLRPC.xsl, and will also be loaded into a javax.xml.transform.stream.StreamSource. This code fragment performs that transform:

try {
  TransformerFactory xformFactory 
   = TransformerFactory.newInstance();
  Source xsl = new StreamSource("FibonacciXMLRPC.xsl");
  Transformer stylesheet = xformFactory.newTransformer(xsl);

  Source request  = new StreamSource(in);
  Result response = new StreamResult(out);
  stylesheet.transform(request, response);
}
catch (TransformerException e) {
  System.err.println(e); 
}

Thread Safety

Neither TransformerFactory nor Transformer is guaranteed to be thread-safe. If your program is multi-threaded, the simplest solution is just to give each separate thread its own TransformerFactory and Transformer objects. However, this can be expensive, especially if you frequently reuse the same large stylesheet, since it will need to be read from disk or the network and parsed every time you create a new Transformer object. There is also likely to be some overhead in building the processor’s internal representation of an XSLT stylesheet from the parsed XML tree.

An alternative is to ask the TransformerFactory to build a Templates object instead. The Templates class represents the parsed stylesheet. You can then ask the Templates class to give you as many separate Transformer objects as you need, each of which can be created very quickly by copying the processor’s in-memory data structures rather than by reparsing the entire stylesheet from disk or the network. The Templates class itself can be safely used across multiple threads.

For example, you might begin loading and compiling the stylesheet like this:

  TransformerFactory xformFactory 
   = TransformerFactory.newInstance();
  Source xsl = new StreamSource("FibonacciXMLRPC.xsl");
  Templates templates = xformFactory.newTemplates(xsl);

Then later in a loop you’d repeatedly load documents and transform them like this:

while (true) {
  InputStream  in   = getNextDocument();
  OutputStream out  = getNextTarget();
  Source request    = new StreamSource(in);
  Result response   = new StreamResult(out);
  Transformer transformer = templates.newTransformer();
  transformer.transform(request, response);
}

Since the thread-unsafe Transformer object is local to the while loop, references to it don’t escape into other threads. This prevents the transform() method from being called concurrently. The Templates object may be shared among multiple threads. However, it is thread safe so this isn’t a problem. Furthermore, all the time-consuming work is done when the Templates object is created. Calling templates.newTransformer() is very quick by comparison.

This technique is particularly important in server environments where the transform may be applied to thousands of different input documents with potentially dozens being processed in parallel in separate threads at the same time. Example 17.6 demonstrates with yet another variation of the Fibonacci XML-RPC servlet. This is the first variation that does not implement the SingleThreadModel interface. It can safely run in multiple threads simultaneously.

Example 17.6. A servlet that uses TrAX and XSLT to respond to XML-RPC requests

import java.io.*;
import javax.servlet.*;
import javax.servlet.http.*;
import javax.xml.transform.*;
import javax.xml.transform.stream.*;


public class FibonacciXMLRPCXSLServlet extends HttpServlet {

  private Templates stylesheet;
  
  // Load the stylesheet
  public void init() throws ServletException {  
    
    try {
      TransformerFactory xformFactory 
       = TransformerFactory.newInstance();
      Source source   = new StreamSource("FibonacciXMLRPC.xsl");
      this.stylesheet = xformFactory.newTemplates(source);
    }
    catch (TransformerException e) { 
      throw new ServletException(
       "Could not load the stylesheet", e); 
    }
    
  }   
  
  // Respond to an XML-RPC request
  public void doPost(HttpServletRequest servletRequest,
   HttpServletResponse servletResponse)
   throws ServletException, IOException {

    servletResponse.setContentType("text/xml; charset=UTF-8");                    
     
    try {
      InputStream in  = servletRequest.getInputStream();
      Source source   = new StreamSource(in);
      PrintWriter out = servletResponse.getWriter();
      Result result   = new StreamResult(out);
      Transformer transformer = stylesheet.newTransformer();
      transformer.transform(source, result);
      servletResponse.flushBuffer();
      out.flush(); 
      out.println();
    }
    catch (TransformerException e) {
      // If we get an exception at this point, it's too late to
      // switch over to an XML-RPC fault. 
      throw new ServletException(e); 
    }
    
  }

}

The init() method simply loads the stylesheet that will transform requests into responses. The doPost() method reads the request and returns the response. The Source is a StreamSource. The result is a StreamResult.

I’m not sure I would recommend this as the proper design for a servlet of this nature. The XSLT transform comes with a lot of overhead. At the least, I would definitely recommend doing the math in Java since XSLT is not optimized for this sort of work. Still, I’m quite impressed with the simplicity and robustness of this code. The thread safety is just the first benefit. Shifting the XML generation into an XSLT document makes the whole program a lot more modular. It’s easy to change the expected input or output format without even recompiling the servlet.

Locating Transformers

The javax.xml.transform.TransformerFactory Java system property determines which XSLT engine TrAX uses. Its value is the fully qualified name of the implementation of the abstract javax.xml.transform.TransformerFactory class. Possible values of this property include:

Saxon 6.x: com.icl.saxon.TransformerFactoryImpl
Saxon 7.x: net.sf.saxon.TransformerFactoryImpl
Xalan: org.apache.xalan.processor.TransformerFactoryImpl
jd.xslt: jd.xml.xslt.trax.TransformerFactoryImpl
Oracle: oracle.xml.jaxp.JXSAXTransformerFactory

This property can be set in all the usual ways a Java system property can be set. TrAX picks from them in this order:

System.setProperty( "javax.xml.transform.TransformerFactory", "classname")
The value specified at the command line using the -Djavax.xml.transform.TransformerFactory=classname option to the java interpreter
The class named in the lib/jaxp.properties properties file in the JRE directory, in a line like this one:
```
javax.xml.transform.TransformerFactory=classname
```
The class named in the META-INF/services/javax.xml.transform.TransformerFactory file in the JAR archives available to the runtime
Finally, if all of the above options fail, TransformerFactory.newInstance() returns a default implementation. In Sun’s JDK 1.4, this is Xalan 2.2d10.

The xml-stylesheet processing instruction

XML documents may contain an xml-stylesheet processing instruction in their prologs that specifies the stylesheet to apply to the XML document. At a minimum, this has an href pseudo-attribute specifying the location of the stylesheet to apply and a type pseudo-attribute specifying the MIME media type of the stylesheet. For XSLT stylesheets the proper type is application/xml. For example, this xml-stylesheet processing instruction indicates the XSLT stylesheet found at the relative URL docbook-xsl-1.50.0/fo/docbook.xsl:

<?xml-stylesheet href="docbook-xsl-1.50.0/fo/docbook.xsl" 
                 type="application/xml"?>

Note

Contrary to what some other books will tell you, there is no such MIME media type as text/xsl, nor is it correct to use it as the value of the type pseudo-attribute. This alleged type is a figment of Microsoft’s imagination. It has never been registered with the IANA as MIME types must be. It is not endorsed by the relevant W3C specifications for XSLT and attaching stylesheets to XML documents, and it is not likely to be in the future.

Official registration of an XSLT specific media type application/xml+xslt has begun, and this type may be used in the future to distinguish XSLT stylesheets from other kinds of XML document. However, the registration has not been completed at the time of this writing.

This processing instruction is a hint. It is only a hint. Programs are not required to use the stylesheet the document indicates. They are free to choose a different transform, multiple transforms, or no transform at all. Indeed, the purpose of this processing instruction is primarily browser display. Programs doing something other than loading the document into a browser for a human to read will likely want to use their own XSLT transforms for their own purposes.

Besides the required href and type pseudo-attributes, the xml-stylesheet processing instruction can also have up to four other optional pseudo-attributes:

alternate: no if this stylesheet is the primary stylesheet for the document; yes if it isn’t. The default is no.
media: A string indicating in which kinds of environments this stylesheet should be used. Possible values include screen (the default), tty, tv, projection, handheld, print, braille, aural, and all.
charset: The character encoding of the stylesheet; e.g. ISO-8859-1, UTF-8, or SJIS.
title: A name for the stylesheet.

For example, these xml-stylesheet processing instructions point at two different XSLT stylesheets, one intended for print and found at the relative URL docbook-xsl-1.50.0/fo/docbook.xsl and the other intended for onscreen display and found at docbook-xsl-1.50.0/html/docbook.xsl. Both are the primary stylesheet for their media.

<?xml-stylesheet href="docbook-xsl-1.50.0/fo/docbook.xsl" 
                 type="application/xml"
                 media="print"
                 title="XSL-FO"
                 encoding="UTF-8"
                 alternate="no"?>
<?xml-stylesheet href="docbook-xsl-1.50.0/html/docbook.xsl" 
                 type="application/xml"
                 media="screen"
                 title="HTML"
                 encoding="UTF-8"
                 alternate="no"?>

The TransformerFactory class has a getAssociatedStylesheet() method that loads the stylesheet indicated by such a processing instruction:

public abstract Source getAssociatedStylesheet(Source xmlDocument, String media, String title, String charset)
    throws TransformerConfigurationException;

This method reads the XML document indicated by the first argument, and looks in its prolog for the stylesheet that matches the criteria given in the other three arguments. If any of these are null, it ignores that criterion. The method then loads the stylesheet matching the criteria into a JAXP Source object and returns it. You can use the TransformerFactory.newTransformer() object to convert this Source into a Transformer object. For example, this code fragment attempts to transform the document read from the InputStream in according to an xml-stylesheet processing instruction for print media found in that document’s prolog. The title and encoding of the stylesheet are not considered, and thus set to null.

// The InputStream in contains the XML document to be transformed
try {
  Source inputDocument = new StreamSource(in);
  TransformerFactory xformFactory 
   = TransformerFactory.newInstance();
  Source xsl = xformFactory.getAssociatedStyleSheet(
   inputDocument, "print", null, null);
  Transformer stylesheet = xformFactory.newTransformer(xsl);

  Result outputDocument = new StreamResult(out);
  stylesheet.transform(inputDocument, outputDocument);
}
catch (TransformerConfigurationException e) {
  System.err.println("Problem with the xml-stylesheet processing instruction"); 
}
catch (TransformerException e) {
  System.err.println("Problem with the stylesheet"); 
}

A TransformerConfigurationException is thrown if there is no xml-stylesheet processing instruction pointing to an XSLT stylesheet matching the specified criteria.

Features

Not all XSLT processors support exactly the same set of capabilities, even within the limits defined by XSLT 1.0. For example, some processors can only transform DOM trees, whereas others may require a sequence of SAX events, and still others may only be able to work with raw streams of text. TrAX uses URI-named features to indicate which of the TrAX classes any given implementation supports. It defines eight standard features as unresolvable URL strings, each of which is also available as a named constant in the relevant TrAX class:

StreamSource.FEATURE: http://javax.xml.transform.stream.StreamSource/feature
StreamResult.FEATURE: http://javax.xml.transform.stream.StreamResult/feature
DOMSource.FEATURE: http://javax.xml.transform.dom.DOMSource/feature
DOMResult.FEATURE: http://javax.xml.transform.dom.DOMResult/feature
SAXSource.FEATURE: http://javax.xml.transform.dom.SAXSource/feature
SAXResult.FEATURE: http://javax.xml.transform.dom.SAXResult/feature
SAXTransformerFactory.FEATURE: http://javax.xml.transform.sax.SAXTransformerFactory/feature
SAXTransformerFactory.FEATURE_XMLFILTER: http://javax.xml.transform.sax.SAXTransformerFactory/feature/xmlfilter

Note

These URLs are just identifiers like namespace URLs. They do not need to be and indeed cannot be resolved. A system does not need to be connected to the Internet to use a transformer that supports these features.

The boolean values of these features for the current XSLT engine can be tested with the getFeature() method in the TransformerFactory class:

public abstract boolean getFeature(String Name);

There’s no corresponding setFeature() method because a TrAX feature reflects the nature of the underlying parser. Unlike, a SAX feature it is not something you can just turn on or off with a switch. Either a processor supports DOM input or it doesn’t. Either a processor supports SAX output or it doesn’t, and so on.

Example 17.7 is a simple program that can be used to test an XSLT processor’s support for the standard JAXP 1.1 features.

Example 17.7. Testing the availability of TrAX features

import javax.xml.transform.*;
import javax.xml.transform.dom.*;
import javax.xml.transform.stream.*;
import javax.xml.transform.sax.*;


public class TrAXFeatureTester {

  public static void main(String[] args) {
  
    TransformerFactory xformFactory 
     = TransformerFactory.newInstance();
      
    String name = xformFactory.getClass().getName();

    if (xformFactory.getFeature(DOMResult.FEATURE)) {
      System.out.println(name + " supports DOM output."); 
    }
    else {
      System.out.println(name + " does not support DOM output."); 
    }
    if (xformFactory.getFeature(DOMSource.FEATURE)) {
      System.out.println(name + " supports DOM input."); 
    }
    else {
      System.out.println(name + " does not support DOM input."); 
    }
    
    if (xformFactory.getFeature(SAXResult.FEATURE)) {
      System.out.println(name + " supports SAX output."); 
    }
    else {
      System.out.println(name + " does not support SAX output."); 
    }
    if (xformFactory.getFeature(SAXSource.FEATURE)) {
      System.out.println(name + " supports SAX input."); 
    }
    else {
      System.out.println(name + " does not support SAX input."); 
    }
    
    if (xformFactory.getFeature(StreamResult.FEATURE)) {
      System.out.println(name + " supports stream output."); 
    }
    else {
      System.out.println(name + " does not support stream output."); 
    }
    if (xformFactory.getFeature(StreamSource.FEATURE)) {
      System.out.println(name + " supports stream input."); 
    }
    else {
      System.out.println(name + " does not support stream input."); 
    }
    
    if (xformFactory.getFeature(SAXTransformerFactory.FEATURE)) {
      System.out.println(name + " returns SAXTransformerFactory "
       + "objects from TransformerFactory.newInstance()."); 
    }
    else {
      System.out.println(name 
       + " does not use SAXTransformerFactory."); 
    }
    if (xformFactory.getFeature(SAXTransformerFactory.FEATURE_XMLFILTER)) {
      System.out.println( 
       name + " supports the newXMLFilter() methods."); 
    }
    else {
      System.out.println( 
       name + " does not support the newXMLFilter() methods."); 
    }
  
  }

}

Here’s the results of running this program against Saxon 6.5.1:

C:\XMLJAVA>java -Djavax.xml.transform.TransformerFactory=
com.icl.saxon.TransformerFactoryImpl TrAXFeatureTester
com.icl.saxon.TransformerFactoryImpl supports DOM output.
com.icl.saxon.TransformerFactoryImpl supports DOM input.
com.icl.saxon.TransformerFactoryImpl supports SAX output.
com.icl.saxon.TransformerFactoryImpl supports SAX input.
com.icl.saxon.TransformerFactoryImpl supports stream output.
com.icl.saxon.TransformerFactoryImpl supports stream input.
com.icl.saxon.TransformerFactoryImpl returns 
 SAXTransformerFactory objects from 
 TransformerFactory.newInstance().
com.icl.saxon.TransformerFactoryImpl supports the newXMLFilter() 
 methods.

As you can see Saxon supports all eight features. Xalan also supports all eight features.

XSLT Processor Attributes

Some XSLT processors provide non-standard, custom attributes that control their behavior. Like features, these are also named via URIs. For example, Xalan-J 2.3 defines these three attributes:

http://apache.org/xalan/features/optimize: By default, Xalan rewrites stylesheets in an attempt to optimize them (similar to the behavior of an optimizing compiler for Java or other languages). This can confuse tools that need direct access to the stylesheet such as XSLT profilers and debuggers. If you’re using such a tool with Xalan, you should set this attribute to false.
http://apache.org/xalan/features/incremental: Setting this feature to true allows Xalan to begin producing output before it has finished processing the entire input document. This may cause problems if an error is detected late in the process, but it shouldn’t be a big problem in fully debugged and tested environments.
http://apache.org/xalan/features/source_location: Setting this to true tells Xalan to provide a JAXP SourceLocator a program can use to determine the location (line numbers, column numbers, system IDs, and public IDs) of individual nodes during the transform. However, it engenders a substantial performance hit so it’s turned off by default.

Other processors define their own attributes. Although TrAX is designed as a generic API, it does let you access such custom features with these two methods:

public abstract void setAttribute(String name, Object value)
    throws IllegalArgumentException;

public abstract Object getAttribute(String name)
    throws IllegalArgumentException;

For example, this code tries to turn on incremental output:

TransformerFactory xformFactory 
 = TransformerFactory.newInstance();
try {
  xformFactory.setAttribute(
   "http://apache.org/xalan/features/incremental", Boolean.TRUE);
}
catch (IllegalArgumentException e) { 
  // This XSLT processor does not support the
  // http://apache.org/xalan/features/incremental attribute,
  // but we can still use the processor anyway
}

If you’re using any processor except Xalan-J 2.x. this will not exactly fail but it won’t exactly succeed either. Using non-standard attributes may limit the portability of your programs. However most attributes (and all of the Xalan attributes) merely adjust how the processor achieves its result. They do not change the final result in any way.

URI Resolution

An XSLT stylesheet can use the document() function to load additional source documents for processing. It can also import or include additional stylesheets with the xsl:import and xsl:include instructions. In all three cases the document to load is identified by a URI.

Normally a Transformer simply loads the document at that URL. However, you can redirect the request to a proxy server, to local copies, or to previously cached copies using a URIResolver. This interface, summarized in Example 17.8, returns Source objects for a specified URL and an optional base. It is similar in intent to SAX’s EntityResolver. However, EntityResolver is based on public and system IDs whereas this interface is based on URLs and base URLs.

Example 17.8. The TrAX URIResolver interface

package javax.xml.transform;

public interface URIResolver {

  public Source resolve(String href, String base) 
   throws TransformerException;
   
}

The resolve() method should return a Source object if it successfully resolves the URL. Otherwise it should return null to indicate that the default URL resolution mechanism should be used. For example, Example 17.9 is a simple URIResolver implementation that looks for a gzipped version of a document (i.e. one that ends in .gz). If it finds one it uses the java.util.zip.GZIPInputStream class to build a StreamSource from the gzipped document. Otherwise, it returns null and the usual methods for resolving URLs are followed.

Example 17.9. A URIResolver class

import javax.xml.transform.*;
import javax.xml.transform.stream.StreamSource;
import java.util.zip.GZIPInputStream;
import java.net.URL;
import java.io.InputStream;


public class GZipURIResolver implements URIResolver {

  public Source resolve(String href, String base) {
   
    try {
      href = href + ".gz";
      URL context = new URL(base);
      URL u = new URL(context, href); 
      InputStream in = u.openStream();
      GZIPInputStream gin = new GZIPInputStream(in);
      return new StreamSource(gin, u.toString());
    }
    catch (Exception e) {
      // If anything goes wrong, just return null and let
      // the default resolver try.
    }
    return null;
  }

}

The following two methods in TransformerFactory set and get the URIResolver that Transformer objects created by this factory will use to resolve URIs:

public abstract void setURIResolver(URIResolver resolver);
public abstract URIResolver getURIResolver();

For example,

URIResolver resolver = new GZipURIResolver();
factory.setURIResolver(resolver);

Error Handling

XSLT transformations can fail for any of several reasons, including:

The stylesheet is syntactically incorrect.
The source document is malformed.
Some external resource the processor needs to load, such as a document referenced by the document() function or the .class file that implements an extension function, is not available.

By default, any such problems are reported by printing them on System.err. However, you can provide more sophisticated error handling, reporting, and logging by implementing the ErrorListener interface. This interface, shown in Example 17.10, is modeled after SAX’s ErrorHandler interface. Indeed aside from the fact that the arguments are all TransformerExceptions instead of SAXExceptions, it’s almost identical.

Example 17.10. The TrAX ErrorListener interface

package javax.xml.transform;

public interface ErrorListener {

  public void warning(TransformerException exception)
   throws TransformerException;
  public void error(TransformerException exception)
   throws TransformerException;
  public void fatalError(TransformerException exception)
   throws TransformerException;
     
}

Example 17.11 demonstrates with a simple class that uses the java.util.logging package introduced in Java 1.4 to report errors rather than printing them on System.err. Each exception is logged to a Logger specified in the constructor. Unfortunately the Logging API doesn’t really have separate categories for fatal and non-fatal errors so I just classify them both as “severe”. (You could define a custom subclass of Level that did differentiate fatal and non-fatal errors; but since this is not a book about the Logging API, I leave that as exercise for the reader.)

Example 17.11. An ErrorListener that uses the Logging API

import javax.xml.transform.*;
import java.util.logging.*;


public class LoggingErrorListener implements ErrorListener {

  private Logger logger;
  
  public LoggingErrorListener(Logger logger) {
    this.logger = logger;
  }
  
  public void warning(TransformerException exception) {
   
    logger.log(Level.WARNING, exception.getMessage(), exception);
   
    // Don't throw an exception and stop the processor
    // just for a warning; but do log the problem
  }
  
  public void error(TransformerException exception)
   throws TransformerException {
    
    logger.log(Level.SEVERE, exception.getMessage(), exception);
    // XSLT is not as draconian as XML. There are numerous errors
    // which the processor may but does not have to recover from; 
    // e.g. multiple templates that match a node with the same
    // priority. I do not want to allow that so I throw this 
    // exception here.
    throw exception;
    
  }
  
  public void fatalError(TransformerException exception)
   throws TransformerException {
    
    logger.log(Level.SEVERE, exception.getMessage(), exception);

    // This is an error which the processor cannot recover from; 
    // e.g. a malformed stylesheet or input document
    // so I must throw this exception here.
    throw exception;
    
  }
     
}

The following two methods appear in both TransformerFactory and Transformer. They enable you to set and get the ErrorListener that the object will report problems to:

public abstract void setErrorListener(ErrorListener listener)
    throws IllegalArgumentException;

public abstract ErrorListener getErrorListener();

An ErrorListener registered with a Transformer will report errors with the transformation. An ErrorListener registered with a TransformerFactory will report errors with the factory’s attempts to create new Transformer objects. For example, this code fragment installs separate LoggingErrorListeners on the TransformerFactory and the Transformer object it creates that will record messages in two different logs.

TransformerFactory factory = TransformerFactory.newInstance();
Logger factoryLogger 
 = Logger.getLogger("com.macfaq.trax.factory");
ErrorListener factoryListener 
 = new LoggingErrorListener(factoryLogger);
factory.setErrorListener(factoryListener);
Source source = new StreamSource("FibonacciXMLRPC.xsl");
Transformer stylesheet = factory.newTransformer(source);
Logger transformerLogger 
 = Logger.getLogger("com.macfaq.trax.transformer");
ErrorListener transformerListener 
 = new LoggingErrorListener(transformerLogger);
stylesheet.setErrorListener(transformerListener);

Passing Parameters to Style Sheets

Top-level xsl:param and xsl:variable elements both define variables by binding a name to a value. This variable can be dereferenced elsewhere in the stylesheet using the form $name. Once set, the value of an XSLT variable is fixed and cannot be changed. However if the variable is defined with a top-level xsl:param element instead of an xsl:variable element, then the default value can be changed before the transformation begins.

For example, the DocBook XSL stylesheets I use to generate this book have a number of parameters that set various formatting options. For this book I use these settings:

  <xsl:param name="fop.extensions">1</xsl:param>
  <xsl:param name="page.width.portrait">7.375in</xsl:param>
  <xsl:param name="page.height.portrait">9.25in</xsl:param>
  <xsl:param name="page.margin.top">0.5in</xsl:param>
  <xsl:param name="page.margin.bottom">0.5in</xsl:param>
  <xsl:param name="region.before.extent">0.5in</xsl:param>
  <xsl:param name="body.margin.top">0.5in</xsl:param>
  <xsl:param name="page.margin.outer">1.0in</xsl:param>
  <xsl:param name="page.margin.inner">1.0in</xsl:param>
  <xsl:param name="body.font.family">Times</xsl:param>
  <xsl:param name="variablelist.as.blocks" select="1"/>
  <xsl:param name="generate.section.toc.level" select="1"/>
  <xsl:param name="generate.component.toc" select="0"/>

The initial (and thus final) value of any parameter can be changed inside your Java code using these three methods of the Transformer class:

public abstract void setParameter(String name, Object value);
public abstract Object getParameter(String name);
public abstract void clearParameters();

The setParameter() method provides a value for a parameter that overrides any value used in the stylesheet itself. The processor is responsible for converting the Java object type passed to a reasonable XSLT equivalent. This should work well enough for String, Integer, Double, and Boolean as well as DOM types like Node and NodeList. However, I wouldn’t rely on it for anything more complex like a File or a Frame.

The getParameter() method returns the value of a parameter previously set by Java. It will not return any value from the stylesheet itself, even if it has not been overridden by the Java code. Finally, the clearParameters() method eliminates all Java mappings of parameters so that those variables are returned to whatever value is specified in the stylesheet.

For example, in Java the above list of parameters for the DocBook stylesheets could be set with a JAXP Transformer object like this:

transformer.setParameter("fop.extensions", "1");
transformer.setParameter("page.width.portrait", "7.375in");
transformer.setParameter("page.height.portrait", "9.25in");
transformer.setParameter("page.margin.top", "0.5in");
transformer.setParameter("region.before.extent", "0.5in");
transformer.setParameter("body.margin.top", "0.5in");
transformer.setParameter("page.margin.bottom", "0.5in");
transformer.setParameter("page.margin.outer", "1.0in");
transformer.setParameter("page.margin.inner", "1.0in");
transformer.setParameter("body.font.family", "Times");
transformer.setParameter("variablelist.as.blocks", "1");
transformer.setParameter("generate.section.toc.level", "1");
transformer.setParameter("generate.component.toc", "0");

Here I used strings for all the values. However, in a few cases I could have used a Number of some kind instead.

Output Properties

XSLT is defined in terms of a transformation from one tree to a different tree, all of which takes place in memory. The actual conversion of that tree to a stream of bytes or a file is an optional step. If that step is taken, the xsl:output instruction controls the details of serialization. For example, it can specify XML, HTML, or plain text output. It can specify the encoding of the output, what the document type declaration points to, whether the elements should be indented, what the value of the standalone declaration is, where CDATA sections should be used, and more. For example, adding this xsl:output element to a stylesheet would produce plain text output instead of XML:

<xsl:output
  method="text"
  encoding="US-ASCII"
  media-type="text/plain"
/>

This xsl:output element asks for pretty-printed XML:

<xsl:output
  method="xml"
  encoding="UTF-16"
  indent="yes"
  media-type="text/xml"
  standalone="yes"
/>

In all, there are ten attributes of the xsl:output element that control serialization of the result tree:

method="xml | html | text": The output method. xml is the default. html uses classic HTML syntax such as <hr> instead of <hr />. text outputs plain text but no markup.
version="1.0": The version number used in the XML declaration. Currently, this should always have the value 1.0.
encoding="UTF-8 | UTF-16 | ISO-8859-1 | …": The encoding used for the output and in the encoding declaration of the output document.
omit-xml-declaration="yes | no": yes if the XML declaration should be omitted, no otherwise. (i.e. no if the XML declaration should be included, yes if it shouldn’t be.) The default is no.
standalone="yes | no": The value of the standalone attribute for the XML declaration; either yes or no
doctype-public="public ID": The public identifier used in the DOCTYPE declaration
doctype-system="URI": The URL used as a system identifier in the DOCTYPE declaration
cdata-section-elements="element_name_1 element_name_2 …": A white space separated list of the qualified names of the elements’ whose content should be output as a CDATA section
indent="yes | no": yes if extra white space should be added to pretty-print the result, no otherwise. The default is no.
media-type="text/xml | text/html | text/plain | application/xml… ": The MIME media type of the output such as text/html, application/xml, or application/xml+svg

Note

All of these output properties are at the discretion of the XSLT processor. The processor is not required to serialize the result tree at all, much less to serialize it with extra white space, a document type declaration, and so forth. In particular, I have encountered XSLT processors that only partially support indent="yes".

You can also control these output properties from inside your Java programs using these four methods in the Transformer class. You can either set them one by one or as a group with the java.util.Properties class.

public abstract void setOutputProperties(Properties outputFormat)
    throws IllegalArgumentException;

public abstract Properties getOutputProperties();

public abstract void setOutputProperty(String name, String value)
    throws IllegalArgumentException;

public abstract String getOutputProperty(String name);

The keys and values for these properties are simply the string names established by the XSLT 1.0 specification. For convenience, the javax.xml.transform.OutputKeys class shown in Example 17.12 provides named constants for all the property names.

Example 17.12. The TrAX OutputKeys class

package javax.xml.transform;

public class OutputKeys {

  private OutputKeys() {}

  public static final String METHOD = "method";
  public static final String VERSION = "version";
  public static final String ENCODING = "encoding";
  public static final String OMIT_XML_DECLARATION 
   = "omit-xml-declaration";
  public static final String STANDALONE = "standalone";
  public static final String DOCTYPE_PUBLIC = "doctype-public";
  public static final String DOCTYPE_SYSTEM = "doctype-system";
  public static final String CDATA_SECTION_ELEMENTS 
   = "cdata-section-elements";
  public static final String INDENT = "indent";
  public static final String MEDIA_TYPE = "media-type";
  
}

For example, this Java code fragment has the same effect as the above xsl:output element:

transformer.setOutputProperty(OutputKeys.METHOD, "xml");
transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-16");
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
transformer.setOutputProperty(OutputKeys.MEDIA_TYPE, "text/xml");
transformer.setOutputProperty(OutputKeys.STANDALONE, "yes");

In the event of a conflict between what the Java code requests with output properties requests and what the stylesheet requests with an xsl:output element, the ones specified in the Java code take precedence.

Sources and Results

The Source and Result interfaces abstract out the API dependent details of exactly how an XML document is represented. You can construct sources from DOM nodes, SAX event sequences, and raw streams. You can target the result of a transform at a DOM Node, a SAX ContentHandler, or a stream-based target such as an OutputStream, Writer, File, or String. Other models may also provide their own implementations of these interfaces. For instance, JDOM has an org.jdom.transform package that includes a JDOMSource and JDOMResult class.

In fact, these different models have very little in common, other than that they all hold an XML document. Consequently, the Source and Result interfaces don’t themselves provide a lot of the functionality you need, just methods to get the system and public ID of the document. Everything else is deferred to the implementations. In fact, XSLT engines generally need to work directly with the subclasses rather than with the generic superclasses; and not all engines are able to process all three kinds of sources and targets. Polymorphism just doesn’t work very well here.

Note

It is important to set at least the system IDs of your sources because some parts of the stylesheet may rely on this. In particular, if any of your xsl:import or xsl:include elements or document() functions contain relative URLs, then they’ll be resolved relative to the URL of the stylesheet source.

DOMSource and DOMResult

A DOMSource is a wrapper around a DOM Node. The DOMSource class provides methods to set and get the node that serves as the root of the transform, as well as the system and public IDs of that node.

Example 17.13. The TrAX DOMSource class

package javax.xml.transform.dom;

public class DOMSource implements Source {

  public static final String FEATURE =
    "http://javax.xml.transform.dom.DOMSource/feature";

  public DOMSource() {}
  public DOMSource(Node node);
  public DOMSource(Node node, String systemID);

  public void    setNode(Node node);
  public Node   getNode();
  public void    setSystemId(String baseID);
  public String getSystemId();

}

In theory, you should be able to convert any DOM Node object into a DOMSource and transform it. In practice, only transforming document nodes is truly reliable. (It’s not even clear that the XSLT processing model applies to anything that isn’t a complete document.) In my tests, Xalan-J could transform all the nodes I threw at it. However, Saxon could only transform Document objects and Element objects that were part of a document tree.

A DOMResult is a wrapper around a DOM Document, DocumentFragment, or Element Node to which the output of the transform will be appended. The DOMResult class provides constructors and methods to set and get the node that serves as the root of the transform, as well as the system and public IDs of that node.

Example 17.14. The TrAX DOMResult class

package javax.xml.transform.dom;

public class DOMResult implements Result {

  public static final String FEATURE =
  "http://javax.xml.transform.dom.DOMResult/feature";

  public DOMResult();
  public DOMResult(Node node);
  public DOMResult(Node node, String systemID);
  
  public void setNode(Node node);
  public Node getNode();
  public void setSystemId(String systemId);
  public String getSystemId();
  
}

If you specify a Node for the result, either via the constructor or by calling setNode(), then the output of the transform will be appended to that node’s children. Otherwise, the transform output will be appended to a new Document or DocumentFragment Node. The getNode() method returns this Node.

SAXSource and SAXResult

The SAXSource class, shown in Example 17.15, provides input to the XSLT processor read from a SAX InputSource by an XMLReader.

Example 17.15. The TrAX SAXSource class

package javax.xml.transform.sax;

public class SAXSource implements Source {

  public static final String FEATURE =
   "http://javax.xml.transform.sax.SAXSource/feature";

  public SAXSource();
  public SAXSource(XMLReader reader, InputSource inputSource);
  public SAXSource(InputSource inputSource);
  
  public void        setXMLReader(XMLReader reader);
  public XMLReader   getXMLReader();
  public void        setInputSource(InputSource inputSource);
  public InputSource getInputSource();
  public void        setSystemId(String systemID);
  public String      getSystemId();
  
  public static InputSource sourceToInputSource(Source source);
  
}

The SAXResult class, shown in Example 17.16, receives output from the XSLT processor as a stream of SAX events fired at a specified ContentHandler and optional LexicalHandler.

Example 17.16. The TrAX SAXResult class

package javax.xml.transform.sax;

public class SAXResult implements Result

  public static final String FEATURE =
   "http://javax.xml.transform.sax.SAXResult/feature";

  public SAXResult();
  public SAXResult(ContentHandler handler);
  
  public void           setHandler(ContentHandler handler);
  public ContentHandler getHandler();
  public void           setLexicalHandler(LexicalHandler handler);
  public LexicalHandler getLexicalHandler();
  public void           setSystemId(String systemId);
  public String         getSystemId();
  
}

StreamSource and StreamResult

The StreamSource and StreamResult classes are used as sources and targets for transforms from sequences of bytes and characters. This includes streams, readers, writers, strings, and files. What unifies these is that none of them know they contain an XML document. Indeed, on input they may not always contain an XML document. If so, an exception will be thrown as soon as you attempt to build a Transformer or a Templates object from the StreamSource.

The StreamSource class, shown in Example 17.17, provides constructors and methods to get and set the actual source of data.

Example 17.17. The TrAX StreamSource class

package javax.xml.transform.stream;

public class StreamSource implements Source {

  public static final String FEATURE =
   "http://javax.xml.transform.stream.StreamSource/feature";

  public StreamSource();
  public StreamSource(InputStream inputStream);
  public StreamSource(InputStream inputStream, String systemID);
  public StreamSource(Reader reader);
  public StreamSource(Reader reader, String systemID);
  public StreamSource(String systemID);
  public StreamSource(File f);
  
  public void        setInputStream(InputStream inputStream);
  public InputStream getInputStream();
  public void        setReader(Reader reader);
  public Reader      getReader();
  public void        setPublicId(String publicID);
  public String      getPublicId();
  public void        setSystemId(String systemID);
  public String      getSystemId();
  public void        setSystemId(File f);
  
}

You should not specify both an InputStream and a Reader. If you do, which one the processor reads from is implementation dependent. If neither an InputStream nor a Reader is available, then the processor will attempt to open a connection to the URI specified by the system ID. You should set the system ID even if you do specify an InputStream or a Reader because this will be needed to resolve relative URLs that appear inside the stylesheet and input document.

The StreamResult class, shown in Example 17.18, provides constructors and methods to get and set the actual target of the data.

Example 17.18. The TrAX StreamResult class

package javax.xml.transform.stream;

public class StreamResult implements Result

  public static final String FEATURE =
   "http://javax.xml.transform.stream.StreamResult/feature";

  public StreamResult() {}
  public StreamResult(OutputStream outputStream);
  public StreamResult(Writer writer);
  public StreamResult(String systemID);
  public StreamResult(File f);
  
  public void         setOutputStream(OutputStream outputStream);
  public OutputStream getOutputStream();
  public void         setWriter(Writer writer);
  public Writer       getWriter();
  public void         setSystemId(String systemID);
  public void         setSystemId(File f);
  public String       getSystemId();
  
}

You should specify the system ID URL and one of the other identifiers (File, OutputStream, Writer, or String.) If you specify more than one possible target, which one the processor chooses is implementation dependent.

Copyright 2001, 2002 Elliotte Rusty Harold	elharo@metalab.unc.edu	Last Modified June 19, 2002
	Up To Cafe con Leche