Friday 21 October 2011

Scala blunder: appending to a Seq that is a List

I recently made a mistake in a loop reading lines from a file, doing some string manipulation and adding the result to a collection. A seemingly trivial Scala script just refused to halt. My mistake is illustrated by the following two toy examples, adding integers to a Seq and a Vector, respectively:

var x1 = Vector[Int]()
for(i <- 0 to 100000) { x1 = x1 :+ i }

var x2 = Seq[Int]()
for(i <- 0 to 100000) { x2 = x2 :+ i }
One of the above for loops runs about 38,648 times slower than the other one (according to a single, somewhat sloppy benchmark using Scala 2.9.1). The explanation, I believe, is that the Seq turned out to be backed by a List. Lists hate being appended to (:+), and this hatred manifests itself in bad performance. Good to know if you want a program to be impressingly slow.

By the way, this made me think of another one:
var s1 = ""
for(i <- 0 to 100000) s1 = s1 + i

var s2 = ""
for(i <- 0 to 100000) s2 = s2.concat(i.toString)
I don't know why you'd want to create a string like the above, but the version using + is about four times slower than the one using concat (Scala 2.9.1).