XML News from Friday, May 21, 2004

Day 3 of WWW2004 kicks off with the usual 9:00 A.M. morning keynote. The conference is beginning to drag a bit, and there are maybe half as many people here this morning as yesterday. No pictures today. I remembered to charge up the camera and then forgot to put it back in my bag when I left the house.

Audio Recorder did provide a nice simple way to record an MP3 file of the day's proceedings. However, the internal microphone in my PowerBook paid more attention to the keyboard clicks than the speakers. I've brought a couple of external microphones this morning to try them out. Hmm, OK the first microphone has the wrong plug for this laptop. That's why I brought two. Hmm, the second one plugs in but doesn't seem to be hearing anything. Possibly that's not a microphone port on the back of my TiBook. Maybe I need a USB microphone? No, according to Apple's this tech note and another tech note, that is indeed a line-in port on the back of my laptop. Maybe I need a newer microphone? A sensible person might have tried plugging these in at home before dragging them into town, but what would be the excitement of that? Maybe I'll run down to CompUSA at lunch.


This morning's keynote will be delivered by Mozelle Thompson, of the Federal Trade Commission. "I have a few notes, but I don't have a PowerPoint presentation. I hate those. Good morning. I'm from the government and I'm here to help you." "We're a small agency but there are those who love us." When listing the laws they enforce, the National Do Not Call list got applause. The Can-Spam Act got silence. About 4 or 5 people in the room admit to having been victims of identity theft. It looks like about half the audience is from outside the U.S. 53,000,000 phone numbers on the Do-Not-Call list. He opposes spyware legislation "at this time." He thinks such legislation is likely to be overbroad, and cover legitimate, consumer beneficial activities like instant updates and anti-virus updates. To head off Congress, he asked industry to, within 90 days, give consumers meaningful notice of what they're doing. Next, he asked them to develop a public education campaign about spyware. Finally he asked the industry to develop a mechanism to talk to washington to identify the really bad actors. If industry does not act, he thinks legislators will act on incomplete, inaccurate information. Stopping spam requires international cooperation. Defensive use of patents is not well reflected in current system.


Today's second keynote is "Higher Learning in the Digital Age" from James J. Duderstadt of the University of Michigan. he thinks the Internet/Web could change higher education in the next decade or two as radically as changed in the two decades after the Civil War. Peer-to-peer interaction is replacing traditional professor-student learning. (I'm not sure I believe this. I certainly don't see it in my own classes at Polytechnic. To the extent they're learning from each other, they're learning bad habits. Perhaps they could learn very rough things like how to put a window on the screen, but they certainly aren't learning how to write code well by reading the Internet and talking to each other. Maybe what he describes is more true in elite universities like Carnegie-Mellon — one of his example — or in undergraduate classes — I mostly teach graduate students, but with some undergraduates mixed in. However, I rarely see my students teaching each other. Half of them barely talk to anyone else in the class. When they do talk to each other, they're more interested in copying the homework than learning or teaching how to do something.) For the near-term (next decade) universities look pretty much like they do now. But over the longer term, the basic structure of the university may change in dramatic ways.


For the morning sessions I decided to go to the panel discussion of "Multimodal Interaction with XML: Are We There Yet?" Alan Turing: "A machine is intelligent only if it can carry on an intelligent conversation." Participants include Kuansan Wang of Microsoft, IBM's Yi-Ming Chee, Motorola's Mark Randolph, AT&T's Michael Johnson, the W3C's Max Froumentin, and Carnegie Mellon's Alex Rudnicky.

Question: Is it possible yet to use XML as the prime language in multimodal interaction, and what is still missing in current XML technology in order for XML to play that role? According to Michael Johnson?, speech is the most developed modality. Pen and gesture support is far behind. "Is it useful to create common XML standards for multimodal interaction?" "What kind of role should XML standards play, and what part of multimodal interaction can be standardized to accelerate the use of XML in multimodal interaction?" There's some restrained dispute about the verbosity of XML, and whether it matters. "What levels of semantics can XML represent in multimodal interaction?" Alex Rudnicky says we don't yet know the right primitives or levels of abstraction for multimodal interaction. "What is the product/research use of XML for multimodal interaction?"


First afternoon session. The current talk is about TeXQuery, which has nothing to do with TeX. It's an extension to XQUery for full text search. It allows you to prioritize matches to particular terms; e.g. finding all documents with an 0.8 score for "Goddesses" and an 0.2 score for "Nike." Co-authors are Sihem Amer-Yahia, Chavdar Botev, and Jayavel Shanmugasundaram. I missed which one was actually presenting. It looks interesting, but the name has to change.


Sebastiano Vigna, of the Università degli Studi di Milano is talking about "The WebGraph Framework I: Compression Techniques" (co-author: Paolo Boldi). He wants to store the entire web directed graph (in a mathematical sense where URIs are nodes and links are arcs) which requires significant compressions because the Web is so big. It's nice to see some math for a change, but the practical impact (if any) escapes me. Apparently it escapes Vigna as well. "We do it because it's fun." It does let you do Google-like page rank tests on a PC. They got slashdotted. They can compress down to 3-3.5 bits per link. "WebGraph exploits the fact that many links within a page are consecutive (with respect to lexicographic order)." He's running RedHat on his laptop. First time I've noticed that at this show (though PowerBooks are fairly common).


For the final session of the regular conference, I plan to return to the W3C track to hear about "Future Work in W3C - Public Q&A" chaired by Steve Bratt. First, Tim Berners-Lee is talking about "What is coming up in W3C?" and then there's the public Q&A session. TimBL uses a PowerBook, and writes (or at least published) his slides in HTML, and he has iTunes playing some New York themed jazz before the session starts. The outline for the talk is:

What Might be Next for W3C? ...
New Working Groups on XML Binary Characterization, SYMM, Math Interest Group, RDF Data Access, and Semantic Web Best Practices and Deployment. The QA working group has not attracted a lot of volunteers from W3C members. Everyone wants someone else to do it.
Considerations for New Work (part 1)

Future of XML. XML attracted the least interest of all the tracks here at WWW 2004. "Maybe it's kind of done." XML is the foundation. "For a lot of people XML's done". (I'm not convinced: I think a lot of XML folks may have just gone to Amsterdam last month instead of this show. There were a lot of people there I don't see here. In fact, off the top of my head there's exactly one speaker in common between the two.) However, RDF has noticed that merging pieces of documents isn't really well addressed. Maybe this requires more work?

The Semantic Web is driving interest in privacy because people are scared of what the semantic Web may do.

Should the W3C open up broad horizontal apps/vocabularies such as life sciences, geospatial, calendaring, social networking (e.g. Friendster), publishing/syndication/RSS, etc.? Calendaring work is ongoing at the IETF. He mistakenly claims RSS 0.9 wasn't RDF. I don't think that's quite true. It was consistent with the RDF draft in existence at the time.

There's an upcoming workshop on device independence in October, location to be announced.

Considerations for New Work (part 2)
Usability is not just accessibility. Why content filtering for just mobile devices? "It's amazing how little we do that is secure. How few things are signed and encrypted." What are we going to about the digital divide between the developed and developing world?

Q&A commences. According to IBM's Mary Ellen Zurko, Germany is considering banning JavaScript. One of the panelists (Phillippe?) suggests that asking users if they want to run active code is pointless. They always click OK. Zurko agrees. TimBL: "Should the W3C start looking at mail?" The W3C has three people fighting spam full time. "Nominally not really in our area." Are the groups looking at spam too academic, he wonders?

TimBL: Haystack is a new UI metaphor. There's a "dire need" for a good user interface for the semantic web.

A couple of questions from the audience: What about publishing and annotations? TimBL recounts some interesting history. Amaya does do annotations.

Fabio Vitali wants to know why there aren't synchronous HTML editors/browsers. If I understand him, this is editing directly in the browser frame without switching modes and saving just as you would in a word processor (e.g. like I've been saving these notes in BBEdit directly onto the site for the last three days).

TimBL: "The Semantic Web is not AI. It's just databases....nobody doing the semsntic web is holding their breath for strong artificial intelligence."


The hotel air conditioning seems to be set to "Ice Box" and I've developed a nasty cold over the last three days. If it doesn't get worse, I'll probably be back tomorrow for the Developer Day.