XML News from Tuesday, March 25, 2008

The W3C Semantic Web Best Practices and Deployment Working Group and HTML Working Groups have published a new working draft of RDFa Primer 1.0.

Current Web pages, written in XHTML, contain inherent structured data: calendar events, contact information, photo captions, song titles, copyright licensing information, etc. When authors and publishers can express this data precisely, and when tools can read it robustly, a new world of user functionality becomes available, letting users transfer structured data between applications and Web sites. An event on a Web page can be directly imported into a desktop calendar. A license on a document can be detected to inform the user of his rights automatically. A photo's creator, camera setting information, resolution, and topic can be published as easily as the original photo itself.

RDFa lets XHTML authors express this structured data using existing XHTML attributes and a handful of new ones. Where data, such as a photo caption, is already present on the page for human readers, the author need not repeat it for automated processes to access it. A Web publisher can easily reuse data fields, e.g. an event's date, defined by other publishers, or create new ones altogether. RDFa gets its expressive power from RDF [RDFPRIMER], though the reader need not understand RDF before reading this document.

For simplicity, instead of using RDF terminology, we use the word "field" to indicate a unit of labeled information, e.g. the "first name" field indicates a person's first name.

RDFa uses Compact URIs, which express a URI using a prefix, e.g. dc:title where dc: stands for http://purl.org/dc/elements/1.1/. In this document, for simplicity's sake, the following prefixes are assumed to be already declared: dc for Dublin Core [DC], foaf for Friend-Of-A-Friend [FOAF], cc for Creative Commons [CC], and xsd for XML Schema Definitions [XSD]:

We use standard XHTML notation for elements and attributes: both are denoted using fixed-width lowercase font, e.g. div, and attributes are differentiated using a preceding '@' character, e.g. @href.

Here's a syntax example from the draft:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN" "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd">
<html xmlns:cal="http://www.w3.org/2002/12/cal/ical#"
      xmlns:contact="http://www.w3.org/2001/vcard-rdf/3.0#">
  <head>
    <title>Jo's Friends and Family Blog</title>
  </head>

  <body>
...
  <p instanceof="cal:Vevent">
    I'm holding
    <span property="cal:summary">
      one last summer Barbecue,
    </span>
    on
    <span property="cal:dtstart" content="20070916T1600-0500">

      September 16th at 4pm.
    </span>
  </p>
...
  <p class="contactinfo" about="http://example.org/staff/jo">
    <span property="contact:fn">Jo Smith</span>.
    <span property="contact:title">Web hacker</span>

    at
    <a rel="contact:org" href="http://example.org">
      Example.org
    </a>.
    You can contact me
    <a rel="contact:email" href="mailto:jo@example.org">
      via email
    </a>.
  </p>
...
    </body>

</html>

The thing that jumps out at me are the use of namespace prefixes in attribute values. Haven't we learned by now that this is a bad idea?