XML Fundamentals


XML Fundamentals

Elliotte Rusty Harold

Monday, March 14, 2005

elharo@metalab.unc.edu

http://www.cafeconleche.org/


Outline


Part I: XML Overview

XML succeeded, and in ways that weren't expected - at least not by many. Originally it was conceived as a document-oriented technology for robust quality publishing of documents over networks. the original workplan had three pillars - XML syntax, XML link, and XML stylesheets. Schemas were not high on the agenda and XML was not seen as an infrastructure for middleware or glueware. It was expected that at some stage it would be necessary to manage data but there was little activity in this area in 1997. When developing Chemical Markup Language (which must be one of the first published XML applications), I found the lack of datatypes very frustrating!

Well, XML is now a basic infrastructure of much modern information. I doubt that anyone now designs a protocol, or operating system without including XML. Although this list sometimes complains that XML isn't as clean as we would like, it works, and it works pretty well.

--Peter Murray-Rust on the xml-dev mailing list, Thursday, February 7, 2002


What is XML?


XML is a Meta Markup Language


XML describes structure and semantics, not formatting


A Song Description in HTML

<dt>Hot Cop
<dd> by Jacques Morali, Henri Belolo, and Victor Willis
<ul>
<li>Jacques Morali
<li>PolyGram Records
<li>6:20
<li>1978
<li>Village People
</ul>
View Document in Browser

A Song Description in XML

<?xml version="1.0" encoding="ISO-8859-1" standalone="yes"?>
<SONG>
  <TITLE>Hot Cop</TITLE>
  <COMPOSER>Jacques Morali</COMPOSER>
  <COMPOSER>Henri Belolo</COMPOSER>
  <COMPOSER>Victor Willis</COMPOSER>
  <PRODUCER>Jacques Morali</PRODUCER>
  <PUBLISHER>PolyGram Records</PUBLISHER>
  <LENGTH>6:20</LENGTH>
  <YEAR>1978</YEAR>
  <ARTIST>Village People</ARTIST>
</SONG>
View Document in Browser

The XML Declaration


Elements


Cascading Style Sheets


CSS Stylesheet for Songs

SONG     {display: block}
TITLE    {display: block; font-family: Helvetica, sans-serif;
          font-size: 20pt; font-weight: bold}
COMPOSER {display: block;
          font-family: Times, Times New Roman, serif;
          font-size: 14pt;
          font-style: italic}
ARTIST   {display: block;
          font-family: Times, Times New Roman, serif;
          font-size: 14pt; font-weight: bold;
          font-style: italic}
PUBLISHER {display: block;
           font-family: Times, Times New Roman, serif;
           font-size: 14pt}
LENGTH    {display: block;
           font-family: Times, Times New Roman, serif;
           font-size: 14pt}
YEAR      {display: block;
           font-family: Times, Times New Roman, serif;
           font-size: 14pt}


Attaching style sheets to documents

<?xml version="1.0"?>
<?xml-stylesheet type="text/css" href="song1.css"?>
<SONG>
  <TITLE>Hot Cop</TITLE>
  <COMPOSER>Jacques Morali</COMPOSER>
  <COMPOSER>Henri Belolo</COMPOSER>
  <COMPOSER>Victor Willis</COMPOSER>
  <PRODUCER>Jacques Morali</PRODUCER>
  <PUBLISHER>PolyGram Records</PUBLISHER>
  <LENGTH>6:20</LENGTH>
  <YEAR>1978</YEAR>
  <ARTIST>Village People</ARTIST>
</SONG>

View Document in Browser

song.xsl

<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:template match="SONG">
    <html>
      <body>
       <h1>
        <xsl:value-of select="TITLE"/> 
        by the 
        <xsl:value-of select="ARTIST"/> 
       </h1>
       <ul>
         <xsl:apply-templates select="COMPOSER"/>
         <li>Publisher: <xsl:value-of select="PUBLISHER"/></li>
         <li>Year: <xsl:value-of select="YEAR"/></li>
         <li>Producer: <xsl:value-of select="PRODUCER"/></li>
       </ul>
      </body>
    </html>
  </xsl:template>

  <xsl:template match="COMPOSER">
    <li>Composer: <xsl:value-of select="."/></li>
  </xsl:template>

</xsl:stylesheet>

Applying an XSLT Style Sheet


Output

<html>
   <body>
      <h1>Hot Cop 
         by the 
         Village People
      </h1>
      <ul>
         <li>Composer: Jacques Morali</li>
         <li>Composer: Henri Belolo</li>
         <li>Composer: Victor Willis</li>
         <li>Publisher: PolyGram Records</li>
         <li>Year: 1978</li>
         <li>Producer: Jacques Morali</li>
      </ul>
   </body>
</html>
View in browser

Editing and Saving XML Documents


Well-formedness

Rules:


Validity

To be valid an XML document must be

  1. Well-formed

  2. Must have a Document Type Definition (DTD)

  3. Must comply with the constraints specified in the DTD


A DTD for Songs

<!ELEMENT SONG (TITLE, COMPOSER+, PRODUCER*, PUBLISHER*, 
                 LENGTH?, YEAR?, ARTIST+)>

<!ELEMENT TITLE (#PCDATA)>

<!ELEMENT COMPOSER (#PCDATA)>
<!ELEMENT PRODUCER (#PCDATA)>
<!ELEMENT PUBLISHER (#PCDATA)>
<!ELEMENT LENGTH (#PCDATA)>
<!-- This should be a four digit year like "1999",
     not a two-digit year like "99" -->
<!ELEMENT YEAR (#PCDATA)>

<!ELEMENT ARTIST (#PCDATA)>

A Valid Song Document

<?xml version="1.0"?>
<!DOCTYPE SONG SYSTEM "song.dtd">
<SONG>
  <TITLE>Hot Cop</TITLE>
  <COMPOSER>Jacques Morali</COMPOSER>
  <COMPOSER>Henri Belolo</COMPOSER>
  <COMPOSER>Victor Willis</COMPOSER>
  <PRODUCER>Jacques Morali</PRODUCER>
  <PUBLISHER>PolyGram Records</PUBLISHER>
  <LENGTH>6:20</LENGTH>
  <YEAR>1978</YEAR>
  <ARTIST>Village People</ARTIST>
</SONG>

Checking Validity

To check validity you pass the document through a validating parser which should report any errors it finds. For example,

% java dom.Counter -v invalidhotcop.xml
[Error] invalidhotcop.xml:10:8: The content of element type "SONG" must match 
"(TITLE,COMPOSER+,PRODUCER*,PUBLISHER*,LENGTH?,YEAR?,ARTIST+)".
invalidhotcop.xml: 862;70;0 ms (7 elems, 0 attrs, 19 spaces, 59 chars)

A valid document:

% java dom.Counter -v validhotcop.xml
validhotcop.xml: 671;70;0 ms (10 elems, 0 attrs, 28 spaces, 98 chars)

A More Complex Example

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet type="text/css" href="song.css"?>
<!DOCTYPE SONG SYSTEM "expanded_song.dtd">
<SONG xmlns="http://metalab.unc.edu/xml/namespace/song"
      xmlns:xlink="http://www.w3.org/1999/xlink">
  <TITLE>Hot Cop</TITLE>
  <PHOTO 
    xlink:type="simple" xlink:show="onLoad" xlink:href="hotcop.jpg"
    ALT="Victor Willis in Cop Outfit" WIDTH="100" HEIGHT="200"/>
  <COMPOSER>Jacques Morali</COMPOSER>
  <COMPOSER>Henri Belolo</COMPOSER>
  <COMPOSER>Victor Willis</COMPOSER>
  <PRODUCER>Jacques Morali</PRODUCER>
  <!-- The publisher is actually Polygram but I needed 
       an example of a general entity reference. -->
  <PUBLISHER xlink:type="simple" xlink:href="http://www.amrecords.com/">
    A &amp; M Records
  </PUBLISHER>
  <LENGTH>6:20</LENGTH>
  <YEAR>1978</YEAR>
  <ARTIST>Village People</ARTIST>
</SONG>
<!-- You can tell what album I was 
     listening to when I wrote this example -->

Attributes

  <PHOTO 
    xlink:type="simple" xlink:show="onLoad" xlink:href="hotcop.jpg"
    ALT="Victor Willis in Cop Outfit" WIDTH="100" HEIGHT="200" />

Empty-element Tags

  <PHOTO 
    xlink:type="simple" xlink:show="onLoad" xlink:href="hotcop.jpg"
    ALT="Victor Willis in Cop Outfit" WIDTH="100" HEIGHT="200" />

Comments

<!-- You can tell what album I was listening to when I wrote this example -->


Namespaces

<SONG xmlns="http://www.cafeconleche.org/namespace/song"
      xmlns:xlink="http://www.w3.org/1999/xlink">
  <TITLE>Hot Cop</TITLE>
  <PHOTO 
    xlink:type="simple" xlink:show="onLoad" xlink:href="hotcop.jpg"
    ALT="Victor Willis in Cop Outfit" WIDTH="100" HEIGHT="200"/>
  <COMPOSER>Jacques Morali</COMPOSER>
  <PUBLISHER xlink:type="simple" xlink:href="http://www.amrecords.com/">
    A &amp; M Records
  </PUBLISHER>
  <ARTIST>Village People</ARTIST>
</SONG>

Entity References

A & M Records


A More Complex DTD

<!ELEMENT SONG (TITLE, PHOTO?, COMPOSER+, PRODUCER*, 
 PUBLISHER*, LENGTH?, YEAR?, ARTIST+)>
<!ATTLIST SONG xmlns       CDATA #REQUIRED
               xmlns:xlink CDATA #REQUIRED>
<!ELEMENT TITLE (#PCDATA)>

<!ELEMENT PHOTO EMPTY>
<!ATTLIST PHOTO xlink:type CDATA #FIXED "simple"
                xlink:href CDATA #REQUIRED
                xlink:show CDATA #IMPLIED
                ALT        CDATA #REQUIRED
                WIDTH      CDATA #REQUIRED
                HEIGHT     CDATA #REQUIRED
>

<!ELEMENT COMPOSER  (#PCDATA)>
<!ELEMENT PRODUCER  (#PCDATA)>
<!ELEMENT PUBLISHER (#PCDATA)>
<!ATTLIST PUBLISHER xlink:type CDATA #IMPLIED
                    xlink:href CDATA #IMPLIED
>

<!ELEMENT LENGTH (#PCDATA)>
<!-- This should be a four digit year like "1999",
     not a two-digit year like "99" -->
<!ELEMENT YEAR (#PCDATA)>

<!ELEMENT ARTIST (#PCDATA)>

Part II: What is XML Good For?


What is XML used for?


Domain-Specific Markup Languages


Self-Describing Data


An XML Fragment

<PERSON ID="p1100" SEX="M">
  <NAME>
    <GIVEN>Judson</GIVEN>
    <SURNAME>McDaniel</SURNAME>
  </NAME>
  <BIRTH>
    <DATE>21 Feb 1834</DATE>
  </BIRTH>
  <DEATH>
    <DATE>9 Dec 1905</DATE>
  </DEATH>
</PERSON>

Interchange of Data Among Applications


Can assemble data from multiple sources


XML Applications


Example XML Applications


Mathematical Markup Language

<?xml version="1.0"?>
<!DOCTYPE html 
     PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
            "../xhtml1/transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>Fiat Lux</title>
</head>
<body>

<p>
And God said,
</p>

<math xmlns="http://www.w3.org/1998/Math/MathML">
  <mrow>
    <msub>
      <mi>&delta;</mi>
      <mi>&alpha;</mi>
    </msub>
    <msup>
      <mi>F</mi>
      <mi>&alpha;&beta;</mi>
    </msup>
    <mo>=</mo>
    <mfrac>
      <mrow>
        <mn>4</mn>
        <mi>&pi;</mi>
      </mrow>
      <mi>c</mi>
    </mfrac>
    <msup>
      <mi>J</mi>
      <mrow>
        <mi>&beta;</mi>
      </mrow>
    </msup>
  </mrow>
</math>

<p>
and there was light.
</p>
</body>
</html>

View in Browser

RSS

????

Today's News on Cafe con Leche

Books

DocBook
OpenOffice
TEI

Vector Graphics

An SVG document

SOAP


WSDL


Database interchange and export


XML for XML


XSL: The Extensible Stylesheet Language


Schemas


W3C XML Schema Language Example

<?xml version="1.0"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
 
  <xsd:element name="SONG" type="SongType"/>

  <xsd:complexType name="SongType">
    <xsd:sequence>
      <xsd:element name="TITLE" type="xsd:string" 
                   minOccurs="1" maxOccurs="1"/>
      <xsd:element name="COMPOSER"  type="xsd:string" 
                   minOccurs="1" maxOccurs="unbounded"/>
      <xsd:element name="PRODUCER"  type="xsd:string" 
                   minOccurs="0" maxOccurs="unbounded"/>
      <xsd:element name="PUBLISHER" type="xsd:string" 
                   minOccurs="0" maxOccurs="1"/>
    
      <xsd:element name="LENGTH" type="xsd:duration" 
                   minOccurs="0" maxOccurs="1"/>
      <xsd:element name="YEAR"   type="xsd:gYear" 
                   minOccurs="1" maxOccurs="1"/>
  
      <xsd:element name="ARTIST" type="xsd:string" 
                   minOccurs="1" maxOccurs="unbounded"/>
    </xsd:sequence>
  </xsd:complexType>

</xsd:schema>

XLinks

<footnote xlink:type="simple" xlink:href="footnote7.xml">7</footnote>

File Formats, in-house applications, and other behind the scenes uses


Part III: A Practical Example


A larger example: Music Catalog


Sample Catalog

http://www.ibiblio.org/nywc/
Offline version/

Organizing the Data


What is the Root Element


The Root Element

<?xml version="1.0" encoding="ISO-8859-1"?>
<catalog>
  Everything else will go here...
</catalog>
View in Browser

What are the Immediate Children of the Root?


Child Elements

<?xml version="1.0" encoding="ISO-8859-1"?>
<catalog>

  <category>
    Small chamber ensembles - 2-4 Players by New York Women Composers
  </category>

</catalog>
View in Browser

White space in XML is not especially significant

<?xml version="1.0" encoding="ISO-8859-1"?>
<catalog><category>Small chamber ensembles 
- 2-4 Players by New York Women Composers</category></catalog>
View in Browser

Composers

Each composer has a name

<?xml version="1.0" encoding="ISO-8859-1"?>
<catalog>

  <category>
    Small chamber ensembles - 2-4 Players by New York Women Composers
  </category>

  <composer>
    <name>Julie Mandel</name>
  </composer>

  <composer>
    <name>Margaret De Wys</name>
  </composer>  
    
  <composer>
    <name>Beth Anderson</name>
  </composer>
    
  <composer>
    <name>Linda Bouchard</name>
  </composer>

</catalog>
View in Browser

Grand Children

<?xml version="1.0" encoding="ISO-8859-1"?>
<catalog>

  <category>
    Small chamber ensembles - 2-4 Players by New York Women Composers
  </category>

  <composer>
    <name>
      <first_name>Julie</first_name> 
      <middle_name></middle_name> 
      <last_name>Mandel</last_name>
    </name>
  </composer>

  <composer>
    <name>
      <first_name>Margaret</first_name> 
      <middle_name>De</middle_name> 
      <last_name>Wys</last_name>
    </name>
  </composer>  
    
  <composer>
    <name>
      <first_name>Beth</first_name> 
      <middle_name></middle_name> 
      <last_name>Anderson</last_name>
    </name>
  </composer>
    
  <composer>
    <name>
      <first_name>Linda</first_name> 
      <middle_name></middle_name> 
      <last_name>Bouchard</last_name>
    </name>
  </composer>

</catalog>
View in Browser

Attributes

<?xml version="1.0" encoding="ISO-8859-1"?>
<catalog>

  <category>
    Small chamber ensembles - 2-4 Players by New York Women Composers
  </category>

  <composer id="c1">
    <name>
      <first_name>Julie</first_name> 
      <middle_name></middle_name> 
      <last_name>Mandel</last_name>
    </name>
  </composer>

  <composer id="c2">
    <name>
      <first_name>Margaret</first_name> 
      <middle_name>De</middle_name> 
      <last_name>Wys</last_name>
    </name>
  </composer>  
    
  <composer id="c3">
    <name>
      <first_name>Beth</first_name> 
      <middle_name></middle_name> 
      <last_name>Anderson</last_name>
    </name>
  </composer>
    
  <composer id="c4">
    <name>
      <first_name>Linda</first_name> 
      <middle_name></middle_name> 
      <last_name>Bouchard</last_name>
    </name>
  </composer>

</catalog>
View in Browser

Attributes vs. Elements


When not to use attributes


Compositions

Let's look at an example of what we want:

Rendered HTML:

Brass Swale (1988) 5", tbn, 2 Bfl tpts, bar. hn

Tonal. Commissioned/Premiered by the Redlands' New Music Ensemble. (A swale is a meadow or a marsh where a lot of wild plants grow together. The composer discovered the word when a horse named Swale won the Kentucky Derby several years ago. Since her work is primarily collage of newly composed musical swatches, she has used the name extensively.) ACA - American Composers Alliance

Or in HTML:

<dt><cite>Brass Swale</cite> (1988) 5", tbn, 2 Bfl tpts, bar. hn</dt>
<dd><p>
Tonal. Commissioned/Premiered by the Redlands' New Music 
Ensemble. (A swale is a meadow or a marsh where a lot of 
wild plants grow together. The composer discovered the word 
when a horse named Swale won the Kentucky Derby several 
years ago. Since her work is primarily collage of newly 
composed musical swatches, she has used the name 
extensively.)  ACA - American Composers 
Alliance</p>
</dd>

Each composition has a


Composition Example in XML

  <composition>
    <title>Brass Swale</title>
    <date>1988</date> 
    <length>5"</length>
    <instruments>tbn, 2 Bfl tpts, bar, hn</instruments>
    <description>
      Tonal. Commissioned/Premiered by the Redlands' New Music
      Ensemble. (A swale is a meadow or a marsh where a lot of
      wild plants grow together. The composer discovered the word
      when a horse named Swale won the Kentucky Derby several
      years ago. Since her work is primarily collage of newly
      composed musical swatches, she has used the name
      extensively.)
    </description>
    <publisher>ACA - American Composers Alliance</publisher>
  </composition>
View in Browser

Further Divisions

  <composition>
    <title>Trio for Flute, Viola and Harp</title>
    <date><year>1994</year></date> 
    <length>13'38"</length>
    <instruments>fl, hp, vla</instruments>
    <description>
      <p>Premiered at Queens College in April, 1996 by Sue Ann Kahn, 
      Christine Ims, and Susan Jolles. In 3 movements :</p>
      <ul>
        <li>mvt. 1: 5:01</li>
        <li>mvt. 2: 4:11</li>
        <li>mvt. 3: 4:26</li>
      </ul>  
    </description>
    <publisher>Theodore Presser</publisher>
  </composition>
View in Browser

Attaching the Composer to the Composition

  <composition composer="c3">
    <title>Trio: Dream in D</title>
    <date><year>1980</year></date> 
    <length>10'</length>
    <instruments>fl, pn, vc, or vn, pn, vc</instruments>
    <description>
      Rhapsodic. Passionate. Available on CD 
      <cite><a href=
       "http://www.amazon.com/exec/obidos/ASIN/B000007NMH/qid=913265342/sr%3D1-2/">
       Two by Three
      </a></cite> from North/South Consonance (1998).
    </description> 
    <publisher></publisher>
  </composition>
View in Browser

Some Keywords For the Search Engines

<?xml version="1.0" encoding="ISO-8859-1"?>
<catalog>

  <category>
    Small chamber ensembles - 2-4 Players by New York Women Composers
  </category>

  <cataloging_info>
    <abstract>Compositions by the members of New York Women Composers</abstract>
    <keyword>music publishing</keyword>
    <keyword>scores</keyword>
    <keyword>women composers</keyword>
    <keyword>New York</keyword>
  </cataloging_info>

  <composer id="c1">
    <name>
      <first_name>Julie</first_name> 
      <middle_name></middle_name> 
      <last_name>Mandel</last_name>
    </name>
  </composer>
  
  ...
  
</catalog>
View in Browser

Standard Signature

<?xml version="1.0" encoding="ISO-8859-1"?>
<catalog>
...
  <last_updated>July 28, 1999</last_updated>
  <copyright>1999 New York Women Composers</copyright>
  <maintainer email="elharo@metalab.unc.edu" 
              url="http://www.macfaq.com/personal.html">
    <name>
      <first_name>Elliotte</first_name> 
      <middle_name>Rusty</middle_name> 
      <last_name>Harold</last_name>
    </name>
  </maintainer>

</catalog>
View in Browser

CSS Style Sheet for the Catalog

category { display: block; 
          font-family: Helvetica, Arial, sans;
          font-size: 32pt; 
          font-weight: bold; 
          text-align: center
         }
       
catalog { font-family: "New York", "Times New Roman", serif; 
          font-size: 14pt; 
          background-color: white; 
          color: black; 
          display: block
        }
      
composer { display: block; 
           font-family: Helvetica, Arial, sans;
           font-size: 24pt; 
           font-weight: bold; 
           text-align: left
         }  
       
composition title { display: block; 
       font-family: Helvetica, Arial, sans;
       font-size: 18pt; 
       font-weight: bold; 
       text-align: left}
       
composition * {display:list-item}
       
description {display: block}
              
// cataloging_info is only for search engines
cataloging_info { display: none;
       color: #FFFFFF}
       
last_updated, copyright, maintainer {display: block;
       font-size: small}
       
copyright:before {content: "Copyright " }

last_updated:before {content: "Last Modified " }

last_updated {margin-top: 2ex }

View in Browser

Possible Extensions


Possible Solutions


XSLT Style Sheet for the Catalog

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:template match="/">
    <html>
       <xsl:apply-templates select="catalog"/>
    </html>
  </xsl:template>

  <xsl:template match="catalog">
    <head>
       <title><xsl:value-of select="category"/></title>      
    </head>
    <body>
       <!-- Header -->            
       <h1><xsl:value-of select="category"/></h1>
       <ul>
         <xsl:for-each select="composition">
           <xsl:sort select="title"/>
           <li>
             <a href="#{generate-id()}">
               <xsl:value-of select="title"/>
             </a>
           </li>
         </xsl:for-each>
       </ul>
       
       <!-- Body -->            
       <xsl:apply-templates select="composer">
         <xsl:sort select="name/last_name"/>
         <xsl:sort select="name/first_name"/>
         <xsl:sort select="name/middle_name"/>
       </xsl:apply-templates>
       
       <!-- Signature -->      
       <hr/>
       Copyright <xsl:value-of select="copyright"/><br/>
       Last Modified: <xsl:value-of select="last_updated"/><br/>
       <xsl:apply-templates select="maintainer"/>
    </body>
  </xsl:template>
  
  <xsl:template match="composer">
    <h2><xsl:value-of select="name"/></h2>
    <xsl:apply-templates select="../composition[@composer=current()/@id]">
       <xsl:sort select="title"/>      
    </xsl:apply-templates>
  </xsl:template>

  <xsl:template match="maintainer">
    <a href="{@url}"><xsl:value-of select="name"/></a><br/>
    <a href="mailto:{@email}"><xsl:value-of select="@email"/></a>
  </xsl:template>

  <xsl:template match="composition">
    <h3><xsl:number value="position()"/>.
      <a name="{generate-id()}">
        <xsl:value-of select="title"/>
      </a>
    </h3>

    <ul>
     <xsl:if test="string(date)">
       <li><xsl:value-of select="date"/></li>
     </xsl:if>
     <xsl:if test="string(length)">
       <li><xsl:value-of select="length"/></li>
     </xsl:if>
     <xsl:if test="string(instruments)">
       <li><xsl:value-of select="instruments"/></li>
     </xsl:if>
     <xsl:if test="string(publisher)">
       <li><xsl:value-of select="publisher"/></li>
     </xsl:if>    
    </ul>

    <p><xsl:apply-templates select="description"/></p>    
    
  </xsl:template>

  <!-- pass unrecognized nodes along unchanged -->
  <xsl:template match="*|@*">
    <xsl:copy>
      <xsl:apply-templates select="node()|@*"/>
    </xsl:copy>
  </xsl:template>  
  
  
</xsl:stylesheet>

View in Browser

CSS or XSL?


Part IV: Well-formedness


Well-formedness Rules


Open and close all tags


Empty-element tags end with />


There is a unique root element


Elements may not overlap


Attribute values are quoted


< and & are only used to start tags and entities


Only the five predefined entity references are used


Character References


Part V: DTDs


Well-formedness vs. validity


DTDs and Validity


What is a DTD?


Validity

To be valid an XML document must be

  1. Well-formed

  2. Must have a document type declaration

  3. Must comply with the constraints specified in the DTD


A Valid Song Document

<?xml version="1.0"?>
<!DOCTYPE SONG SYSTEM "song.dtd">
<SONG>
  <TITLE>Hot Cop</TITLE>
  <COMPOSER>Jacques Morali</COMPOSER>
  <COMPOSER>Henri Belolo</COMPOSER>
  <COMPOSER>Victor Willis</COMPOSER>
  <PRODUCER>Jacques Morali</PRODUCER>
  <PUBLISHER>PolyGram Records</PUBLISHER>
  <LENGTH>6:20</LENGTH>
  <YEAR>1978</YEAR>
  <ARTIST>Village People</ARTIST>
</SONG>

Checking Validity

To check validity you pass the document through a validating parser which should report any errors it finds. For example,

% java sax.Counter -v invalidhotcop.xml
Error at (file file:/D:/speaking/SD99EAST/dtds/invalidhotcop.xml, line 10, char
8): Element "<SONG>" is not valid because it does not follow the rule, "(TITLE,C
OMPOSER+,PRODUCER*,PUBLISHER*,LENGTH?,YEAR?,ARTIST+)".
invalidhotcop.xml: 281 ms

A valid document:

% java sax.Counter -v validhotcop.xml
validhotcop.xml: 170 ms

Internal DTD Subsets

<?xml version="1.0"?>
<!DOCTYPE SONG [
  <!ELEMENT SONG (TITLE, COMPOSER+, PRODUCER*, PUBLISHER*, 
                  LENGTH?, YEAR?, ARTIST+)>

  <!ELEMENT TITLE (#PCDATA)>

  <!ELEMENT COMPOSER (#PCDATA)>
  <!ELEMENT PRODUCER (#PCDATA)>
  <!ELEMENT PUBLISHER (#PCDATA)>
  <!ELEMENT LENGTH (#PCDATA)>
  <!-- This should be a four digit year like "1999",
     not a two-digit year like "99" -->
  <!ELEMENT YEAR (#PCDATA)>

  <!ELEMENT ARTIST (#PCDATA)>
]>
<SONG>
  <TITLE>Hot Cop</TITLE>
  <COMPOSER>Jacques Morali</COMPOSER>
  <COMPOSER>Henri Belolo</COMPOSER>
  <COMPOSER>Victor Willis</COMPOSER>
  <PRODUCER>Jacques Morali</PRODUCER>
  <PUBLISHER>PolyGram Records</PUBLISHER>
  <LENGTH>6:20</LENGTH>
  <YEAR>1978</YEAR>
  <ARTIST>Village People</ARTIST>
</SONG>

The importance of validation


Where are DTDs Important?


Domain-Specific Markup Languages


Self-Describing Data

<!-- This should be a four digit year like "1999",
     not a two-digit year like "99" -->
<!ELEMENT YEAR (#PCDATA)>

Interchange of Data Among Applications


Structured and Integrated Data


An Example Document

<?xml version="1.0" encoding="ISO-8859-1"?>
<catalog>

  <category>
    Small chamber ensembles - 2-4 Players by New York Women Composers
  </category>

  <cataloging_info>
    <abstract>Compositions by the members of New York Women Composers</abstract>
    <keyword>music publishing</keyword>
    <keyword>scores</keyword>
    <keyword>women composers</keyword>
    <keyword>New York</keyword>
  </cataloging_info>

  <last_updated>July 28, 1999</last_updated>
  <copyright>1999 New York Women Composers</copyright>
  <maintainer email="elharo@metalab.unc.edu" 
              url="http://www.macfaq.com/personal.html">
    <name>
      <first_name>Elliotte</first_name> 
      <middle_name>Rusty</middle_name> 
      <last_name>Harold</last_name>
    </name>
  </maintainer>

  <composer id="c1">
    <name>
      <first_name>Julie</first_name> 
      <middle_name></middle_name> 
      <last_name>Mandel</last_name>
    </name>
  </composer>

  <composer id="c2">
    <name>
      <first_name>Margaret</first_name> 
      <middle_name>De</middle_name> 
      <last_name>Wys</last_name>
    </name>
  </composer>  
    
  <composer id="c3">
    <name>
      <first_name>Beth</first_name> 
      <middle_name></middle_name> 
      <last_name>Anderson</last_name>
    </name>
  </composer>
    
  <composer id="c4">
    <name>
      <first_name>Linda</first_name> 
      <middle_name></middle_name> 
      <last_name>Bouchard</last_name>
    </name>
  </composer>
    
  <composition composer="c1">
    <title>Trio for Flute, Viola and Harp</title>
    <date><year>(1994)</year></date> 
    <length>13'38"</length>
    <instruments>fl, hp, vla</instruments>
    <description>
      <p>Premiered at Queens College in April, 1996 by Sue Ann Kahn, 
      Christine Ims, and Susan Jolles. In 3 movements :</p>
      <ul>
        <li>mvt. 1: 5:01</li>
        <li>mvt. 2: 4:11</li>
        <li>mvt. 3: 4:26</li>
      </ul>  
    </description>
    <publisher>Theodore Presser</publisher>
  </composition>

  <composition composer="c2">
    <title>Charmonium</title>
    <date><year>(1991)</year></date> 
    <length>9'</length>
    <instruments>2 vln, vla, vc</instruments>
    <description>
      Commissioned as quartet for the Meridian String Quartet. 
      Sonorous, bold. Moderate difficulty. Tape available.
    </description> 
    <publisher></publisher>
  </composition>

  <composition composer="c1">
    <title>Invention for Flute and Piano</title>
    <date><year>(1994)</year></date> 
    <length></length>
    <instruments>fl, pn</instruments>
    <description>3 movements</description> 
    <publisher></publisher>
  </composition>

  <composition composer="c3">
    <title>Little Trio</title>
    <date><year>(1984)</year></date> 
    <length>4'</length>
    <instruments>fl, guit, va</instruments>
    <description></description> 
    <publisher>ACA</publisher>
  </composition>

  <composition composer="c3">
    <title>Dr. Blood's Mermaid Lullaby</title>
    <date><year>(1980)</year></date> 
    <length>3'</length>
    <instruments>fl or ob, or vn, or vc, pn</instruments>
    <description></description> 
    <publisher>ACA</publisher>
  </composition>

  <composition composer="c3">
    <title>Trio: Dream in D</title>
    <date><year>(1980)</year></date> 
    <length>10'</length>
    <instruments>fl, pn, vc, or vn, pn, vc</instruments>
    <description>
      Rhapsodic. Passionate. Available on CD 
      <cite><a href="http://www.amazon.com/exec/obidos/ASIN/B000007NMH/qid%3D913265342/sr%3D1-2/">Two by Three</a></cite> 
      from North/South Consonance (1998).
    </description> 
    <publisher></publisher>
  </composition>

  <composition composer="c4">
    <title>Propos II</title>
    <date><year>(1985)</year></date> 
    <length>11'</length>
    <instruments>2 tpt</instruments>
    <description>Arrangement from Propos</description> 
    <publisher></publisher>
  </composition>

  <composition composer="c4">
    <title>Rictus En Mirroir</title>
    <date><year>(1985)</year></date> 
    <length>14'</length>
    <instruments>fl, ob, hpschd, vc</instruments>
    <description></description> 
    <publisher></publisher>
  </composition>

</catalog>
View in Browser

Element Declarations


Content Specifications


ANY

<!ELEMENT catalog ANY>

#PCDATA

  <year>1984</year>

<!ELEMENT year (#PCDATA)>

#PCDATA

<year>1999</year>
<year>99</year>
<year>1999 C.E.</year>
<year>
 The year of our Lord one thousand, nine hundred, and ninety-nine
</year>
<year>
 Delicious, delicious. Oh how boring.
</year>
<year>
<month>January</month>
<month>February</month>
<month>March</month>
<month>April</month>
<month>May</month>
<month>June</month>
<month>July</month>
<month>August</month>
<month>September</month>
<month>October</month>
<month>November</month>
<month>December</month>
</year>

#PCDATA

There are a number of elements in the example document that only contain PCDATA:

<!ELEMENT category     (#PCDATA)>
<!ELEMENT abstract     (#PCDATA)>
<!ELEMENT keyword      (#PCDATA)>
<!ELEMENT last_updated (#PCDATA)>
<!ELEMENT copyright    (#PCDATA)>
<!ELEMENT first_name   (#PCDATA)>
<!ELEMENT middle_name  (#PCDATA)>
<!ELEMENT last_name    (#PCDATA)>
<!ELEMENT title        (#PCDATA)>
<!ELEMENT year         (#PCDATA)>
<!ELEMENT instruments  (#PCDATA)>
<!ELEMENT publisher    (#PCDATA)>
<!ELEMENT length       (#PCDATA)>

Comments in DTDs

<!-- e.g. "1999 New York Women Composers", 
     not "Copyright 1999 New York Women Composers" -->
<!ELEMENT copyright (#PCDATA)>

Child Elements

    <date><year>1994</year></date> 
<!ELEMENT date (year)>

Child Elements

You only have to declare the immediate children

   <maintainer email="elharo@metalab.unc.edu" 
              url="http://www.macfaq.com/personal.html">
    <name>
      <first_name>Elliotte</first_name> 
      <middle_name>Rusty</middle_name> 
      <last_name>Harold</last_name>
    </name>
  </maintainer>

  <composer id="c1">
    <name>
      <first_name>Julie</first_name> 
      <middle_name></middle_name> 
      <last_name>Mandel</last_name>
    </name>
  </composer> 
<!ELEMENT maintainer (name)>
<!ELEMENT composer (name)>

Sequences

    <name>
      <first_name>Elliotte</first_name> 
      <middle_name>Rusty</middle_name> 
      <last_name>Harold</last_name>
    </name>

More Sequences

ELEMENT

One or More Children +

  <cataloging_info>
    <abstract>Compositions by the members of New York Women Composers</abstract>
    <keyword>music publishing</keyword>
    <keyword>scores</keyword>
    <keyword>women composers</keyword>
    <keyword>New York</keyword>
  </cataloging_info>
<!ELEMENT cataloging_info (abstract, keyword+)>

Zero or More Children *

<!ELEMENT catalog (category, cataloging_info, last_updated, copyright, 
                   maintainer, composer*, composition*)>

Zero or One Children ?

  <composition composer="c1">
    <title>Trio for Flute, Viola and Harp</title>
    <date><year>1994</year></date> 
    <length>13'38"</length>
    <instruments>fl, hp, vla</instruments>
    <description>
      <p>Premiered at Queens College in April, 1996 by Sue Ann Kahn, 
      Christine Ims, and Susan Jolles. In 3 movements :</p>
      <ul>
        <li>mvt. 1: 5:01</li>
        <li>mvt. 2: 4:11</li>
        <li>mvt. 3: 4:26</li>
      </ul>  
    </description>
    <publisher>Theodore Presser</publisher>
  </composition>
<!ELEMENT composition 
   (title, date, length?, instruments, description?, publisher?)>

Choices

<!ELEMENT date (year | ISODate)>
<!ELEMENT year (#PCDATA)>
<!ELEMENT ISODate (#PCDATA)>

Mixed Content

<!ELEMENT description (#PCDATA | ul | a | cite | p)*>
<!ELEMENT cite (#PCDATA | a)*>
<!ELEMENT ul (li*)>
<!ELEMENT li (#PCDATA)>
<!ELEMENT a (#PCDATA)>
<!ELEMENT p (#PCDATA)>

Content Models You Can't Declare


Attribute Declarations

Recall this element:

  <maintainer email="elharo@metalab.unc.edu" 
              url="http://www.macfaq.com/personal.html">
    <name>
      <first_name>Elliotte</first_name> 
      <middle_name>Rusty</middle_name> 
      <last_name>Harold</last_name>
    </name>
  </maintainer>

It is declared like this:

<!ELEMENT maintainer (name)>
<!ATTLIST maintainer email CDATA "webmaster@nywc.org">
<!ATTLIST maintainer url CDATA "http://www.ibiblio.org/nywc">

The general format of an <!ATTLIST> declaration is:

<!ATTLIST Element_name Attribute_name Type Default_value>

Multiple Attribute Declarations

  <maintainer email="elharo@metalab.unc.edu" 
              url="http://www.macfaq.com/personal.html">
    <name>
      <first_name>Elliotte</first_name> 
      <middle_name>Rusty</middle_name> 
      <last_name>Harold</last_name>
    </name>
  </maintainer>

It is declared like this:

<!ELEMENT maintainer (name)>
<!ATTLIST maintainer email CDATA "webmaster@nywc.org">
<!ATTLIST maintainer url CDATA "http://www.ibiblio.org/nywc">

But it can also be declared in a single <!ATTLIST> declaration like this:

<!ATTLIST maintainer email CDATA "webmaster@nywc.org" url CDATA "http://www.ibiblio.org/nywc/">

This is more obvious with better indentation:

<!ATTLIST maintainer email CDATA "webmaster@nywc.org" 
                     url   CDATA "http://www.ibiblio.org/nywc/">
                     

Attribute Default Values


#REQUIRED

<!ELEMENT maintainer (name)>
<!ATTLIST maintainer email CDATA #REQUIRED 
                     url   CDATA #REQUIRED>

#IMPLIED

<!ELEMENT a (#PCDATA)>
<!ATTLIST a href CDATA #IMPLIED>

#FIXED

<!ELEMENT maintainer (name)>
<!ATTLIST maintainer email CDATA #FIXED "webmaster@nywc.org" 
                     url   CDATA #REQUIRED>

Attribute Types


CDATA

<!ATTLIST maintainer email CDATA #REQUIRED 
                     url   CDATA #IMPLIED>

ID

<!ELEMENT composer (name)>
<!ATTLIST composer id ID #REQUIRED>

IDREF

<!ELEMENT composition (title, date, length?, 
   instruments, description?, publisher?)>
<!ATTLIST composition composer IDREF #REQUIRED>

IDREFS

<!ELEMENT composition (title, date, length?, 
   instruments, description?, publisher?)>
<!ATTLIST composition composer IDREFS #REQUIRED>

Finished DTD

<!ELEMENT category (#PCDATA)>
<!ELEMENT abstract (#PCDATA)>
<!ELEMENT keyword (#PCDATA)>
<!ELEMENT last_updated (#PCDATA)>
<!ELEMENT copyright (#PCDATA)>
<!ELEMENT title (#PCDATA)>
<!ELEMENT instruments (#PCDATA)>
<!ELEMENT publisher (#PCDATA)>
<!ELEMENT length (#PCDATA)>

<!ELEMENT date (year | ISODate)>
<!ELEMENT year (#PCDATA)>
<!ELEMENT ISODate (#PCDATA)>

<!ELEMENT catalog (category, cataloging_info, last_updated, 
   copyright, maintainer, (composer | composition)*)>

<!ELEMENT cataloging_info (abstract, keyword+)>

<!ELEMENT description (#PCDATA | ul | a | cite | p)*>
<!ELEMENT cite (#PCDATA | a)*>
<!ELEMENT ul (li*)>
<!ELEMENT li (#PCDATA)>
<!ELEMENT a (#PCDATA)>
<!ELEMENT p (#PCDATA)>

<!ELEMENT maintainer (name)>
<!ELEMENT name (first_name, middle_name, last_name)>
<!ELEMENT first_name (#PCDATA)>
<!ELEMENT middle_name (#PCDATA)>
<!ELEMENT last_name (#PCDATA)>
<!ATTLIST maintainer email CDATA #REQUIRED 
                     url   CDATA #IMPLIED>
                     
<!ELEMENT composer (name)>
<!ATTLIST composer id ID #REQUIRED>

<!ELEMENT composition (title, date, length?, 
                          instruments, description?, publisher?)>
<!ATTLIST composition composer IDREFS #REQUIRED>

<!ATTLIST a href CDATA #REQUIRED>

Part VI: Namespaces


Raison d'etre

  1. To distinguish between elements and attributes from different vocabularies with the same names.

  2. To group all related elements and attributes together so that a parser can easily recognize them.


The Need for Namespaces


Namespaces disambiguate elements


Namespace Syntax


Namespace URIs


Binding Prefixes to Namespace URIs


Binding Prefixes to Namespace URIs Example

<xhtml:html xmlns:xhtml="http://www.w3.org/1999/xhtml"
            xmlns:xlink="http://www.w3.org/1999/xlink">
  <xhtml:head><xhtml:title>Three Namespaces</xhtml:title></xhtml:head>
  <xhtml:body>
    <xhtml:h1 align="center">An Ellipse and a Rectangle</xhtml:h1>
    <svg:svg xmlns:svg="http://www.w3.org/2000/svg" 
             width="12cm" height="10cm">
      <svg:ellipse rx="110" ry="130" />
      <svg:rect x="4cm" y="1cm" width="3cm" height="6cm" />
    </svg:svg>
    <xhtml:p xlink:type="simple" 
      xlink:href="ellipses.html">
      More about ellipses
    </xhtml:p>
    <xhtml:p xlink:type="simple" xlink:href="rectangles.html">
      More about rectangles
    </xhtml:p>
    <xhtml:hr/>
    <xhtml:p>Last Modified February 13, 2000</xhtml:p>    
  </xhtml:body>
</xhtml:html>

The Default Namespace


Unprefixed attributes are never in any namespace


URIs matter; not prefixes


Namespace URIs do not necessarily point to a document, page, or schema


Namespaces and DTDs


To Learn More


Index | Cafe con Leche

Copyright 2002-2005 Elliotte Rusty Harold
elharo@metalab.unc.edu
Last Modified March 16, 2005