Summary

Converting data in non-XML formats to XML is not particularly difficult. However, there are no magic bullets to do this for you. You just have to roll your sleeves up and write the code. Once you’ve parsed the input data and organized it in the form you want, outputting it as XML is not hard. However, parsing the non-XML input data can be quite a challenge.

Unless the data is truly flat (and real-world data very rarely is), you’ll generally want to arrange your XML markup to indicate the hierarchy and structure of the data. There are many different ways to perform the conversion that differ primarily in where the work is done. If you’re extracting data from a relational database using JDBC, you may be able to make multiple SQL queries, possibly joining tables, so that the data that enters your program is already in more or less the form and order you want. Alternately, if the input is coming from flat files, you may read it into a flat structure such as a list or an array, and then use Java code to rearrange it in a more hierarchical structure such as a tree. Finally, you can read the data in the most naive format possible, write it out again as almost an XML-copy of the original structure, then post-process the initial XML document with XSLT or XQuery to get the structure you want. All three approaches produce the same document in the end. They differ primarily in when the hard work is done: at the beginning, middle, or end. Which one to choose depends largely on your relative comfort with SQL, Java, and XSLT.


Copyright 2001, 2002 Elliotte Rusty Haroldelharo@metalab.unc.eduLast Modified August 21, 2001
Up To Cafe con Leche