Friday 21 October 2011

Scala blunder: appending to a Seq that is a List

I recently made a mistake in a loop reading lines from a file, doing some string manipulation and adding the result to a collection. A seemingly trivial Scala script just refused to halt. My mistake is illustrated by the following two toy examples, adding integers to a Seq and a Vector, respectively:

var x1 = Vector[Int]()
for(i <- 0 to 100000) { x1 = x1 :+ i }

var x2 = Seq[Int]()
for(i <- 0 to 100000) { x2 = x2 :+ i }
One of the above for loops runs about 38,648 times slower than the other one (according to a single, somewhat sloppy benchmark using Scala 2.9.1). The explanation, I believe, is that the Seq turned out to be backed by a List. Lists hate being appended to (:+), and this hatred manifests itself in bad performance. Good to know if you want a program to be impressingly slow.

By the way, this made me think of another one:
var s1 = ""
for(i <- 0 to 100000) s1 = s1 + i

var s2 = ""
for(i <- 0 to 100000) s2 = s2.concat(i.toString)
I don't know why you'd want to create a string like the above, but the version using + is about four times slower than the one using concat (Scala 2.9.1).

2 comments:

Anonymous said...

you will want IndexedSeq which has efficient append. The default IndexedSeq implementation indeed is Vector. There was a discussion on the scala mailing lists about whether the default Seq implementation should also be Vector, because it is almost always as fast as List. But for the time being, make sure you use IndexedSeq.

http://www.scala-lang.org/docu/files/collections-api/collections_5.html

http://www.scala-lang.org/docu/files/collections-api/collections_40.html

For large operations like the one you were showing, you may better use a builder, a mutable data structure which efficiently assembles your data and then returns the target data structure when you call result():

val b = Seq.newBuilder[Int]; for(i <- 0 to 100000) { b += i }; b.result()

or make a shortcut via Seq.tabulate:

Seq.tabulate( 100000 )( (i: Int) => i )

Jesper Nordenberg said...

Yes, it's stupid that List is the default immutable collection type. I think most people who really need a List knows how and when to create one.