Friday, 29 April 2011

Testing Scala 2.9.0 (RC2) parallel collections: four extra key strokes, double speed

We have just tried the new parallel collections that you can find in Scala 2.9.0.RC2.

By adding .par at a few places, the software we tested ran almost twice (1.9 x) as fast on a two core processor. Running the same code on a four core processor was, as expected, quicker (2.7 x), but not four times as fast. That's quite a performance boost, with close to zero programming effort.

The software we've tested validates (electronic) pronunciation dictionaries, where each entry has an orthography, a phonetic transcription and some other stuff. The program runs a large number of quality checks to find problems (faulty transcriptions, inconsistencies, etc) that are hard or impossible for a human lexicographer to find. It runs hundreds or even thousands of validation rules, using regular expressions and other string processing, on a hundred thousand or more dictionary entries.

The software runs a sequence of validation rules on each input entry. The validation rules are independent of each other, suitable for running in parallel. The rules, living in a Seq, are applied in sequence in a call to map(...). By calling .par.map(...) on the Seqs holding the validation rules, a multi-core processor is now able to perform the validation in parallel (par returns a parallel version of a collection).

Apart from using parallel collections at the point where the validation rules are run, we also run the main loop, reading the input lexicon data, using a parallel collection. Adding parallel collections at different places (the outermost loop and inside the validation) seems to add to the performance gain.

An initial problem that we had, was that the Scala 2.9.0.RC2 API documentation fooled us to believe that foldLeft would, just like map, run in parallel. That appears to be incorrect. We had to change calls to foldLeft into calls to map (followed by an additional foldLeft to aggregate the result). I don't know if I've misunderstood the documentation, or if parallel foldLeft is pending.

Anyway, double speed, or more, with zero effort. It sounds too good to be true, but this quick test suggests that it works like a charm.

And now I want more cores.