Friday 5 December 2008

Scala: Problems using the XML API

I've encountered some problems using the Scala XML API. The first one had to do with scala.xml.XML.loadFile throwing away comment nodes of the input XML file.

A helpful person on the scala-user list suggested instead using scala.xml.parsing.ConstructingParser.fromFile. This worked nicely, keeping the comment elements of the input file intact. However, when processing larger XML files, this approach did not work well, resulting in out of memory exceptions.

Finally, I got yet a helpful answer on the scala-user list, this time in the form of some code, translating Java XML nodes into the Scala equivalents.

If you get into the same trouble as I did, you may want to take a look at this code snippet posted on the scala-user list by David Pollak. (You might have to change the code a bit to suit your needs, though.)

Yet a problem I've encountered: you might be hit by a performance problem when extracting child nodes of a large Elem using the \\ or \ operators. (The fix seems to be to loop over the child nodes instead.)

Summary: The current Scala XML API may not work flawlessly if you both want to process rather large documents and at the same time keep all the information of the original input XML file... but it works fine if you write your own XML file reader (see link above) and are careful with the use of \\ or \ on large Elems.


Here's an earlier post on Scala XML processing.

No comments: