XML News from Thursday, December 11, 2008

The W3C has published the first working draft of rdf:text: A Datatype for Internationalized Text:

The datatype identified by the URI http://www.w3.org/1999/02/22-rdf-syntax-ns#text (abbreviated rdf:text) allows for the representation of internationalized text strings. In addition to the RIF and OWL specifications, this datatype is expected to supersede RDF's plain literals with language tags, cf. [5], which is why this datatype has been added into the rdf: namespace.

Value Space. The value space of rdf:text is the set of all pairs of the form ( "text" , "lang" ), where "text" is a string and "lang" is either the empty string "" or a lowercase language tag.

Lexical Space. A lexical value of rdf:text is a string "val" that contains at least one @ character (U+40) and that satisfies the following condition:

Let i be the position of the last @ (U+40) character in "val", and let "abc" and "tag" be the substrings of "val" containing the characters up to and after position i (noninclusive), respectively. Then ,"tag" MUST be either empty or a valid language tag.

Each such lexical value is assigned a data value ( "abc", "lc-tag" ), where "lc-tag" is the string "tag" converted to lowercase.

Editor's Note: Open Issues: The definition of the set of characters, particularly the fact that it is infinite, as well as the compatibility with XML strings - whether the string part of the lex & val space should be the same as xs:string - are still under discussion.

Lexical value "Family Guy@en" is mapped to the data value ( "Family Guy" , "en" ), and "Family Guy@" is mapped to ( "Family Guy" , "" ). Furthermore, "Family Guy" is not a valid lexical value of rdf:text because it does not contain the @ (U+40) character.