XQuery


XQuery

eXQuisite or eXcruciating?

Elliotte Rusty Harold

Weekend with Experts

Saturday, December 10, 2005

elharo@metalab.unc.edu

http://www.cafeconleche.org/


Versions Covered


XQuery

Three parts:


XQuery Language


Documents to Query


Physical Representations to Query


Where is XQuery used?


The XML Model vs. the Relational Model

A relational database contains tables An XML database contains collections
A relational table contains records with the same schema A collection contains XML documents with the same DTD
A relational record is an unordered list of named values An XML document is a tree of nodes
A SQL query returns an unordered set of records An XQuery returns an ordered sequence of nodes

Query Data Types

XPath 2.0 Data Model Type Hierarchy

Picture taken from XQuery 1.0 and XPath 2.0 Functions and Operators W3C Working Draft 3 November 2005


An example document to query

Most of the examples in this talk query this bibliography document at the (relative) URL bib.xml:

<?xml version="1.0"?>
<bib>
  <book year="1994">
    <title>TCP/IP Illustrated</title>
    <author><last>Stevens</last><first>W.</first></author>
    <publisher>Addison-Wesley</publisher>
    <price>65.95</price>
  </book>

  <book year="1992">
    <title>Advanced Programming in the Unix Environment</title>
    <author><last>Stevens</last><first>W.</first></author>
    <publisher>Addison-Wesley</publisher>
    <price>65.95</price>
  </book>

  <book year="2000">
    <title>Data on the Web</title>
    <author><last>Abiteboul</last><first>Serge</first></author>
    <author><last>Buneman</last><first>Peter</first></author>
    <author><last>Suciu</last><first>Dan</first></author>
    <publisher>Morgan Kaufmann Publishers</publisher>
    <price>39.95</price>
  </book>

  <book year="1999">
    <title>The Economics of Technology and Content for Digital TV</title>
    <editor>
      <last>Gerbarg</last><first>Darcy</first>
      <affiliation>CITI</affiliation>
    </editor>
    <publisher>Kluwer Academic Publishers</publisher>
    <price>129.95</price>
  </book>

</bib>

Adapted from Mary Fernandez, Jerome Simeon, and Phil Wadler: XML Query Languages: Experiences and Exemplars, 1999, as adapted in XML Query Use Cases


The XQuery FLWOR

photo of flower

Query: List titles of all books

   for $t in doc("bib.xml")/bib/book/title
   return
      $t 

Adapted from XML Query Use Cases


Query Result: Book Titles

% java -classpath saxon8.jar net.sf.saxon.Query query1
<?xml version="1.0" encoding="UTF-8"?>
<title>TCP/IP Illustrated</title>
<title>Advanced Programming in the Unix Environment</title>
<title>Data on the Web</title>
<title>The Economics of Technology and Content for Digital TV</title>

XQueryX


Specifying a context node


Query Result with wrapping


Serialization Format


XPath 1.0 Data Model

(Adapted from Jeni Tennison)


XPath 2.0 Data Model

(Adapted from Jeni Tennison)


Constructing sequences


Sequence Math


Sequence example

for $a in (1 to 10)
return $a

Output:

1
2
3
4
5
6
7
8
9
10

Data types and the PSVI


Element Constructors

List titles of all books in a bib element. Put each title in a book element.

<bib>
  {
   for $title in doc("bib.xml")/bib/book/title
   return
    <book>
     { $title }
    </book>
  }
</bib>

Adapted from XML Query Use Cases


Query Result: Book Titles

<?xml version="1.0" encoding="UTF-8"?>
<bib>
   <book>
      <title>TCP/IP Illustrated</title>
   </book>
   <book>
      <title>Advanced Programming in the Unix Environment</title>
   </book>
   <book>
      <title>Data on the Web</title>
   </book>
   <book>
      <title>The Economics of Technology and Content for Digital TV</title>
   </book>
</bib>

Attribute Constructors

Adapted from XML Query Use Cases


Query Result

<?xml version="1.0" encoding="UTF-8"?>
<bib>
   <book year="1994">
      <title>TCP/IP Illustrated</title>
   </book>
   <book year="1992">
      <title>Advanced Programming in the Unix Environment</title>
   </book>
   <book year="2000">
      <title>Data on the Web</title>
   </book>
   <book year="1999">
      <title>The Economics of Technology and Content for Digital TV</title>
   </book>
</bib>

Text Constructors


Query Result

<?xml version="1.0" encoding="UTF-8"?>
<bib>
   <h1>Bibliography</h1>
   <title>TCP/IP Illustrated</title>
   <title>Advanced Programming in the Unix Environment</title>
   <title>Data on the Web</title>
   <title>The Economics of Technology and Content for Digital TV</title>
</bib>

Other Constructors


Expected Query Result

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="application/xml" href="bibliography.css"?>
<bib>
   <h1>Bibliography</h1>
   <title>TCP/IP Illustrated</title>
   <title>Advanced Programming in the Unix Environment</title>
   <title>Data on the Web</title>
   <title>The Economics of Technology and Content for Digital TV</title>
</bib>
<!-- An example from Elliotte Rusty Harold's 
  XQuery presentation -->

Query with where

Adapted from XML Query Use Cases


Query Result: Titles of books published by Addison-Wesley

<?xml version="1.0" encoding="UTF-8"?>
<bib>
   <title>TCP/IP Illustrated</title>
   <title>Advanced Programming in the Unix Environment</title>
</bib>

Adapted from XML Query Use Cases


Query with Booleans

Adapted from XML Query Use Cases


Query Result: books published by Addison-Wesley before 1993

<?xml version="1.0" encoding="UTF-8"?>
<bib>
   <title>Advanced Programming in the Unix Environment</title>
</bib>

Adapted from XML Query Use Cases


Query with multiple variables

Create a list of all the title-author pairs, with each pair enclosed in a result element.

<results>
 {
   for $book in doc("bib.xml")/bib/book,
     $title in $book/title,
     $author in $book/author
   return
    <result>
    { $title }
    { $author }
    </result>
  }
</results>

Adapted from XML Query Use Cases


Query Result: A list of all the title-author pairs

<?xml version="1.0" encoding="UTF-8"?>
<results>
   <result>
      <title>TCP/IP Illustrated</title>
      <author>
         <last>Stevens</last>
         <first>W.</first>
      </author>
   </result>
   <result>
      <title>Advanced Programming in the Unix Environment</title>
      <author>
         <last>Stevens</last>
         <first>W.</first>
      </author>
   </result>
   <result>
      <title>Data on the Web</title>
      <author>
         <last>Abiteboul</last>
         <first>Serge</first>
      </author>
   </result>
   <result>
      <title>Data on the Web</title>
      <author>
         <last>Buneman</last>
         <first>Peter</first>
      </author>
   </result>
   <result>
      <title>Data on the Web</title>
      <author>
         <last>Suciu</last>
         <first>Dan</first>
      </author>
   </result>
</results>

Adapted from XML Query Use Cases


Nested FLWOR Queries

For each book in the bibliography, list the title and authors, grouped inside a result element.

<results>
 {
   for $b in doc("bib.xml")/bib/book
     return
      <result>
       { $b/title }
       {  
         for $a in $b/author
         return $a
       }
      </result>
 }
</results>

Adapted from XML Query Use Cases


Query Result: A list of the title and authors of each book in the bibliography

<?xml version="1.0" encoding="UTF-8"?>
<results>
   <result>
      <title>TCP/IP Illustrated</title>
      <author>
         <last>Stevens</last>
         <first>W.</first>
      </author>
   </result>
   <result>
      <title>Advanced Programming in the Unix Environment</title>
      <author>
         <last>Stevens</last>
         <first>W.</first>
      </author>
   </result>
   <result>
      <title>Data on the Web</title>
      <author>
         <last>Abiteboul</last>
         <first>Serge</first>
      </author>
      <author>
         <last>Buneman</last>
         <first>Peter</first>
      </author>
      <author>
         <last>Suciu</last>
         <first>Dan</first>
      </author>
   </result>
   <result>
      <title>The Economics of Technology and Content for Digital TV</title>
   </result>
</results>

Adapted from XML Query Use Cases


Query with let


Query Result: price differences

<?xml version="1.0" encoding="UTF-8"?>
<results>
   <data>
      <title>TCP/IP Illustrated</title> is -9.5 more expensive than the average. </data>
   <data>
      <title>Advanced Programming in the Unix Environment</title> is -9.5 more expensive than the average. </data>
   <data>
      <title>Data on the Web</title> is -35.5 more expensive than the average. </data>
   <data>
      <title>The Economics of Technology and Content for Digital TV</title> is 54.499999999999986 more expensive than the average. </data>
</results>

if then else

For each book in the bibliography, list the difference between the book's price and the average price, but this time indicate whether the book is more or less expensive than the average

<results> 
  {
   let $doc := doc("bib.xml")
   let $average := avg($doc//price)
   for $b in $doc/bib/book
     return
       if ($b/price > $average) then
         <data>
           { $b/title } is ${$b/price - $average} 
           more expensive than the average.
         </data>
       else  
         <data>
           { $b/title } is ${$average - $b/price} 
           less expensive than the average.
         </data>
  }    
</results>

Query Result: Price differences

<?xml version="1.0" encoding="UTF-8"?>
<results>
   <data>
      <title>TCP/IP Illustrated</title> is $9.5 less expensive than the average.</data>
   <data>
      <title>Advanced Programming in the Unix Environment</title> is $9.5 less expensive than the average.</data>
   <data>
      <title>Data on the Web</title> is $35.5 less expensive than the average.</data>
   <data>
      <title>The Economics of Technology and Content for Digital TV</title> is $54.499999999999986 more expensive than the average.</data>
</results>

Query with sorting

List the titles and years of all books published by Addison-Wesley after 1991, in alphabetic order.

<bib>
 {
   for $b in doc("bib.xml")//book[publisher = "Addison-Wesley"]
   order by ($b/title)
   return
    <book>
     { $b/@year } { $b/title }
    </book> 
 }
</bib>

Adapted from XML Query Use Cases


Query Result

<?xml version="1.0" encoding="UTF-8"?>
<bib>
   <book year="1992">
      <title>Advanced Programming in the Unix Environment</title>
   </book>
   <book year="1994">
      <title>TCP/IP Illustrated</title>
   </book>
</bib>

Adapted from XML Query Use Cases


ORDER BY modifiers

<bib>
 {
   for $b in doc("bib.xml")//book[publisher = "Addison-Wesley"]
   order by ($b/title) descending
   return
    <book>
     { $b/@year } { $b/title }
    </book> 
 }
</bib>

Adapted from XML Query Use Cases


Query Result

<?xml version="1.0" encoding="UTF-8"?>
<bib>
   <book year="1994">
      <title>TCP/IP Illustrated</title>
   </book>
   <book year="1992">
      <title>Advanced Programming in the Unix Environment</title>
   </book>
</bib>

Adapted from XML Query Use Cases


A different document about books

Sample data at "reviews.xml":

<?xml version="1.0"?>
<reviews>
  <entry>
    <title>Data on the Web</title>
    <price>34.95</price>
    <review>
       A very good discussion of semi-structured database
       systems and XML.
    </review>
  </entry>
  <entry>
    <title>Advanced Programming in the Unix Environment</title>
    <price>65.95</price>
    <review>
      A clear and detailed discussion of UNIX programming.
    </review>
  </entry>
  <entry>
    <title>TCP/IP Illustrated</title>
    <price>65.95</price>
    <review>
      One of the best books on TCP/IP.
    </review>
  </entry>
</reviews>

Adapted from XML Query Use Cases


This document uses a different DTD

<!ELEMENT reviews (entry*)>
<!ELEMENT entry   (title, price, review)>
<!ELEMENT title   (#PCDATA)>
<!ELEMENT price   (#PCDATA)>
<!ELEMENT review  (#PCDATA)>

Query that joins two documents

For each book found in both bib.xml and reviews.xml, list the title of the book and its price from each source.

<books-with-prices>
 {
   for $b in doc("bib.xml")//book,
     $a in doc("reviews.xml")//entry
   where $b/title = $a/title
   return
    <book-with-prices>
     { $b/title },
       <price-amazon> { $a/price/text() } </price-amazon>
       <price-bn> { $b/price/text() } </price-bn>
    </book-with-prices>
 }
</books-with-prices>

Adapted from XML Query Use Cases


Result

<?xml version="1.0" encoding="UTF-8"?>
<books-with-prices>
   <book-with-prices>
      <title>TCP/IP Illustrated</title>,
       <price-amazon>65.95</price-amazon>
      <price-bn>65.95</price-bn>
   </book-with-prices>
   <book-with-prices>
      <title>Advanced Programming in the Unix Environment</title>,
       <price-amazon>65.95</price-amazon>
      <price-bn>65.95</price-bn>
   </book-with-prices>
   <book-with-prices>
      <title>Data on the Web</title>,
       <price-amazon>34.95</price-amazon>
      <price-bn>39.95</price-bn>
   </book-with-prices>
</books-with-prices>

Adapted from XML Query Use Cases


prices.xml Query Sample Data

The next query also uses an input document named "prices.xml":

<?xml version="1.0"?>
<prices>
  <book>
    <title>Advanced Programming in the Unix Environment</title>
    <source>www.amazon.com</source>
    <price>65.95</price>
  </book>
  <book>
    <title>Advanced Programming in the Unix Environment</title>
    <source>www.bn.com</source>
    <price>65.95</price>
  </book>
  <book>
    <title>TCP/IP Illustrated</title>
    <source>www.amazon.com</source>
    <price>65.95</price>
  </book>
  <book>
    <title>TCP/IP Illustrated</title>
    <source>www.bn.com</source>
    <price>65.95</price>
  </book>
  <book>
    <title>Data on the Web</title>
    <source>www.amazon.com</source>
    <price>34.95</price>
  </book>
  <book>
    <title>Data on the Web</title>
    <source>www.bn.com</source>
    <price>39.95</price>
  </book>
</prices>


Adapted from XML Query Use Cases


Query with reused variables

<results>
 {
   let $doc := doc("prices.xml")
   for $t in distinct-values($doc/prices/book/title)
     let $p := $doc/prices/book[title = $t]/price
     return
       <minprice title="{$t}">
         { min($p) }
       </minprice>
 }
</results>

Adapted from XML Query Use Cases


Query Result

<?xml version="1.0" encoding="UTF-8"?>
<results>
   <minprice title="Advanced Programming in the Unix Environment">65.95</minprice>
   <minprice title="TCP/IP Illustrated">65.95</minprice>
   <minprice title="Data on the Web">34.95</minprice>
</results>

Adapted from XML Query Use Cases


Multiple FLWOR Queries

<bib>
 {
   for $b in doc("bib.xml")//book[author]
   return
    <book>
     { $b/title }
     { $b/author }
    </book>,
   for $b in doc("bib.xml")//book[editor]
   return
    <reference>
     { $b/title }
     <org> { $b/editor/affiliation/text() } </org>
    </reference>
 }
</bib>

Adapted from XML Query Use Cases


Query Result

<?xml version="1.0" encoding="UTF-8"?>
<bib>
   <book>
      <title>TCP/IP Illustrated</title>
      <author>
         <last>Stevens</last>
         <first>W.</first>
      </author>
   </book>
   <book>
      <title>Advanced Programming in the Unix Environment</title>
      <author>
         <last>Stevens</last>
         <first>W.</first>
      </author>
   </book>
   <book>
      <title>Data on the Web</title>
      <author>
         <last>Abiteboul</last>
         <first>Serge</first>
      </author>
      <author>
         <last>Buneman</last>
         <first>Peter</first>
      </author>
      <author>
         <last>Suciu</last>
         <first>Dan</first>
      </author>
   </book>
   <reference>
      <title>The Economics of Technology and Content for Digital TV</title>
      <org>CITI</org>
   </reference>
</bib>

Adapted from XML Query Use Cases


Querying documents that use namespaces


Query Software


What's the difference between XQuery and XSLT?


XPath 2.0


XPath 2.0 Goals


Held over from XPath 1.0


Accessor Functions

fn:node-name(Node)
returns zero or one QName
fn:string(Object)
returns the string value of anything
fn:data(Node)
returns a sequence of zero or more typed simple values
fn:base-uri(node)
returns the base URI of an Element or Document node
fn:document-uri(node)
returns the document URI of an Element or Document node

Constructor Functions


Casting

if ($x castable as xs:gYear) then 
  $x cast as xs:gYear
else if ($x castable as xs:integer) then 
  $x cast as xs:integer
else if ($x castable as xs:decimal) then 
  $x cast as xs:decimal
else 
  $x cast as string

Four kinds of comparison operators


Value comparison operators


General comparisons


Node comparisons


Order comparisons


Functions and operators


Arithmetic operators


Numeric Functions


String functions


Regular expressions


Boolean Functions and Operators


Date and time functions


Qualified Name Functions


Node Functions


Sequence Functions


Sequence size Functions

fn:zero-or-one($arg as item()*) => item()?
Returns $arg if it contains zero or one items. Otherwise, raises an error
fn:one-or-more($arg as item()*) => item()?
Returns $arg if it containsone or more items. Otherwise, raises an error
fn:exactly-one($arg as item()*) => item()?
Returns $arg if it contains exactly one item. Otherwise, raises an error

Context Functions


Other New features in XPath 2.0


XPath Comments

<xsl:apply-templates 
 select="(: The difference between the context node and the 
             current node is crucial here :)
 ../composition[@composer=current()/@id]"/>

Namespace wildcards

<xsl:template match="*:set">
  This matches MathML set elements, SVG set elements, set
  elements in no namespace at all, etc. 
</xsl:template>

Can use functions as location steps


Can use parenthesized expressions as location steps


Dereference steps


For Expressions


for Example

Consider the list of weblogs at http://static.userland.com/weblogMonitor/logs.xml

<?xml version="1.0" encoding="ISO-8859-1" ?>
<!DOCTYPE foo SYSTEM "http://msdn.microsoft.com/xml/general/htmlentities.dtd">
<weblogs>
    <log>
        <name>MozillaZine</name>
        <url>http://www.mozillazine.org</url>
        <changesUrl>http://www.mozillazine.org/contents.rdf</changesUrl>
        <ownerName>Jason Kersey</ownerName>
        <ownerEmail>kerz@en.com</ownerEmail>
        <description>THE source for news on the Mozilla Organization.  DevChats, Reviews, Chats, Builds, Demos, Screenshots, and more.</description>
        <imageUrl></imageUrl>
        <adImageUrl>http://static.userland.com/weblogMonitor/ads/kerz@en.com.gif</adImageUrl>
    </log>
    <log>
        <name>SalonHerringWiredFool</name>
        <url>http://www.salonherringwiredfool.com/</url>
        <ownerName>Some Random Herring</ownerName>
        <ownerEmail>salonfool@wiredherring.com</ownerEmail>
        <description></description>
    </log>
    <log>
        <name>SlashDot.Org</name>
        <url>http://www.slashdot.org/</url>
        <ownerName>Simply a friend</ownerName>
        <ownerEmail>afriendofweblogs@weblogs.com</ownerEmail>
        <description>News for Nerds, Stuff that Matters.</description>
    </log>
</weblogs>

The changesUrl element points to a document like this:

<?xml version="1.0"?>
<!DOCTYPE rss PUBLIC "-//Netscape Communications//DTD RSS 0.91//EN" 
                     "http://my.netscape.com/publish/formats/rss-0.91.dtd">
<rss version="0.91">
  <channel>
    <title>MozillaZine</title>
    <link>http://www.mozillazine.org/</link>
    <language>en-us</language>
    <description>Your source for Mozilla news, advocacy, interviews, builds, and more!</description>
    <copyright>Copyright 1998-2002, The MozillaZine Organization</copyright>
    <managingEditor>jason@mozillazine.org</managingEditor>
    <webMaster>jason@mozillazine.org</webMaster>
    <image>
      <title>MozillaZine</title>
      <url>http://www.mozillazine.org/image/mynetscape88.gif</url>
      <description>Your source for Mozilla news, advocacy, interviews, builds, and more!</description>
      <link>http://www.mozillazine.org/</link>
    </image>

    <item>
      <title>BugDays Are Back!</title>
      <link>http://www.mozillazine.org/talkback.html?article=2151</link>
    </item>

    <item>
      <title>Independent Status Reports</title>
      <link>http://www.mozillazine.org/talkback.html?article=2150</link>
    </item>

  </channel>

</rss>

We want to process all the item elements from each weblog.


for Example


<xsl:template match="weblogs">
  <xsl:apply-templates select="
    for $url in log/changesUrl
    return doc($url)//item
  "/>
</xsl:template>

Conditional Expressions

Not all weblogs have a changesUrl

<xsl:template match="log">
  <xsl:apply-templates select="
    if (changesUrl)
     then document(changesUrl)
     else document(url)"/>
</xsl:template>

Quantified Expressions

<xsl:template match="weblogs">
  <xsl:if test="some $log in log satisfies changesURL">
     At least one log has a changesURL
  </xsl:if>
</xsl:template>

<xsl:template match="weblogs">
  <xsl:if test="every $log in log satisfies url">
    Every log has a url
  </xsl:if>
</xsl:template>

To Learn More


Index | Cafe con Leche | Cafe au Lait

Copyright 2002-2005 Elliotte Rusty Harold
elharo@metalab.unc.edu
Last Modified December 10, 2005