Nikoloogle Lindbloogle: scala

Showing posts with label scala. Show all posts

Friday, 21 October 2011

Scala blunder: appending to a Seq that is a List

I recently made a mistake in a loop reading lines from a file, doing some string manipulation and adding the result to a collection. A seemingly trivial Scala script just refused to halt. My mistake is illustrated by the following two toy examples, adding integers to a Seq and a Vector, respectively:

var x1 = Vector[Int]()
for(i <- 0 to 100000) { x1 = x1 :+ i }

var x2 = Seq[Int]()
for(i <- 0 to 100000) { x2 = x2 :+ i }

One of the above for loops runs about 38,648 times slower than the other one (according to a single, somewhat sloppy benchmark using Scala 2.9.1). The explanation, I believe, is that the Seq turned out to be backed by a List. Lists hate being appended to (:+), and this hatred manifests itself in bad performance. Good to know if you want a program to be impressingly slow.

By the way, this made me think of another one:

var s1 = ""
for(i <- 0 to 100000) s1 = s1 + i

var s2 = ""
for(i <- 0 to 100000) s2 = s2.concat(i.toString)

I don't know why you'd want to create a string like the above, but the version using + is about four times slower than the one using concat (Scala 2.9.1).

Friday, 29 April 2011

Testing Scala 2.9.0 (RC2) parallel collections: four extra key strokes, double speed

We have just tried the new parallel collections that you can find in Scala 2.9.0.RC2.

By adding .par at a few places, the software we tested ran almost twice (1.9 x) as fast on a two core processor. Running the same code on a four core processor was, as expected, quicker (2.7 x), but not four times as fast. That's quite a performance boost, with close to zero programming effort.

The software we've tested validates (electronic) pronunciation dictionaries, where each entry has an orthography, a phonetic transcription and some other stuff. The program runs a large number of quality checks to find problems (faulty transcriptions, inconsistencies, etc) that are hard or impossible for a human lexicographer to find. It runs hundreds or even thousands of validation rules, using regular expressions and other string processing, on a hundred thousand or more dictionary entries.

The software runs a sequence of validation rules on each input entry. The validation rules are independent of each other, suitable for running in parallel. The rules, living in a Seq, are applied in sequence in a call to map(...). By calling .par.map(...) on the Seqs holding the validation rules, a multi-core processor is now able to perform the validation in parallel (par returns a parallel version of a collection).

Apart from using parallel collections at the point where the validation rules are run, we also run the main loop, reading the input lexicon data, using a parallel collection. Adding parallel collections at different places (the outermost loop and inside the validation) seems to add to the performance gain.

An initial problem that we had, was that the Scala 2.9.0.RC2 API documentation fooled us to believe that foldLeft would, just like map, run in parallel. That appears to be incorrect. We had to change calls to foldLeft into calls to map (followed by an additional foldLeft to aggregate the result). I don't know if I've misunderstood the documentation, or if parallel foldLeft is pending.

Anyway, double speed, or more, with zero effort. It sounds too good to be true, but this quick test suggests that it works like a charm.

And now I want more cores.

Wednesday, 15 September 2010

Interview with Maxime Lévesque, author of Squeryl

Squeryl is a great Scala database API. On its website, it is describe like this: "A Scala ORM and DSL for talking with Databases with minimum verbosity and maximum type safety".

Preparing an introduction to Squeryl for a Swedish computer magazine, I sent a number of questions to Maxime Lévesque, the man behind Squeryl. The answers were so interesting, that I asked his permission to post them here:

Could you describe yourself in a few words?

I'm a dad, a programmer, a hobbyist bass player and percussionist.

I'm the kind of programmer who prefers to write libraries and frameworks to writing applications. If I was in the construction industry I'd probably be making bricks, mortar and nails rather than houses.

Do you develop Squeryl as part of your work, or is it a hobby?

Squeryl started as a hobby, only later did I start using it in a commercial project.

What are the most important features of Squeryl? Why should you use it?

The main reason to use Squeryl in an application, in my opinion, is to have the data access code validated by the compiler. I've seen many projects where the database schema stops evolving after a lot of code has been written against it. Ugly workarounds are sometimes chosen because there isn't enough time to investigate the repercussions of a schema change or conduct all the testing required.

Strongly typed languages are good for "deterministic refactoring". A data access layer needs to be refactorable, as any part of a system does. Perhaps to an even greater extent, because in a sense, bad design decisions get persisted with the data.

A developer needs all the help he can get from tools such as compilers and IDEs. Hard work and discipline don't scale. Why rely on it when you can have automated validation?

Reusability is another big one. Squeryl queries are composable, reusable pieces of code. A query that encodes a particular piece of application logic needs only be written once, and reused anywhere it is needed. I'm a big believer in the DRY principle (Don't Repeat Yourself).

Low verbosity would be another strength. I dislike APIs or frameworks that require you to write more than you should.

What's the story behind Squeryl?

In 2005 I wrote an ORM for dotNet. I was in need of one at the time and I couldn't find a decent one that exploited generics and annotations, so I wrote my own. By the time I considered publishing it, LINQ came out, and instantly obsoleted my ORM (and all other ORMs except HaskelDB in my opinion).

A few years later I started to write a query DSL in Java, and at every step, I got bitten by language limitations. Every time I worked around them, the solution became a bit more ugly and verbose. I then discovered Scala, and started experimenting with writing a statically typed query DSL. I was amazed by the expressivity of the language.

The fact that it was possible to write Squeryl as a library (i.e., without a compiler plug-in) speaks a lot about the potency of the language. The first two attempts were abandoned when they reached a critical level of inelegance. They were Squeryl's pre-history.

Squeryl is in fact my third attempt at a Scala ORM. When I became confident that a fourth rewrite wouldn't be necessary, I published it on GitHub.

If Squeryl didn't exist, what would you use?

If Squeryl didn't exist, I'd have a look at ScalaQuery or Circumflex. I only have a superficial knowledge of them, but I would surely try them out before going to any of the Java based ORMs.

If you are to demo Squeryl (e.g., to a Java programmer), do you have a favourite example?

Here's a one liner that says a lot :

val avgHeight: Option[Float] = 
  from(people)(p => compute(avg(p.heightInCentimeters)))

Apart from the shortness of the code, we can see a few implicit conversions at work. The compiler "knows" that the sum query can translate into a 32 bit floating point value, but it also "knows" that it is an Option[], because the avg aggregate function is not guaranteed to return something (the table can be empty). In fact it won't compile if you try to refer to it as a (non Option[]) Float.

Where has Squeryl turned up? Who uses it?

I haven't made any survey, it's on my todo list, but I've exchanged emails with developers that are building systems with Squeryl in fields ranging from finance to bioinformatics.

I read something about Lift...?

Ross Mellgren from the Lift team has written an integration module that is part of Lift 2.1 (release candidate).

What's on the roadmap?

High on my priority list is free text search (backed by Lucene). Longer term I'd like to add things like support for sharding and extending the DSL to exploit the geospatial capabilities of databases like Postgres, Oracle and H2.

Is it of any importance that Squeryl was written i Scala? Or was this merely a coincidence?

Without Scala there wouldn't be a way to have strongly typed queries on the JVM without having verbosity that reaches a caricatural level. Not only wouldn't there be Squeryl, but there wouldn't be anything like it.

When Java came out I was impressed with all the features it had built in: serialization, RMI, garbage collection, portability. It was in its time a game changing technology. Today I have the same impression of Scala: the level of static validation that it gives you, all this with minimal verbosity. If I could say just one thing to qualify it, I'd have to say: game changing.

So the answer is yes, Scala made Squeryl possible. I expect a lot of interesting Scala DSLs will get written in many domains in the coming years. I have a few other DSLs I'd like to write myself.

Any particular advice for someone beginning with Squeryl?

I would just copy an example from the Squeryl site, and modify it gradually so that it becomes your own schema. And most importantly, don't hesitate to ask questions in the discussion groups. I'm often impressed by the quality of the answers given by the community.

Thanks a lot for the great answers!

Sunday, 11 April 2010

A tiny Scala case class to clean up user input

We needed some cleaning up of user input entered into a text field. We ended up with a Scala case class that cleans up its constructor string argument a bit, by removing multiple whitespace characters and trimming it. It behaves like this:

scala> Text("    a       a      ") == Text("a a")                                                                
res0: Boolean = true
scala> Text("   a      a       ").text == Text("a a").text                                                      
res1: Boolean = true
scala> Text("  a     a    ").text          
res2: java.lang.String = a a

The code looks like this:

case class Text(private var _text: String) {
 val text = _text.trim.replaceAll(" +", " ")
 _text = text
}

Since the input string, var _text, is private, we can manipulate it a bit, without making it possible for others to tamper with. I'm not sure if this is the obvious way to do it, but it seems to work as intended.

We tried a similar version that did not work:

// Doesn't work
case class BrokenText(private var _text: String) {
  _text = _text.trim.replaceAll(" +", " ")
  val text = _text
}

This version does not work since Text.text will return the original string, not the cleaned up one:

scala> BrokenText("    a      b      ")
res0: BrokenText = BrokenText(a b)
scala> res0.text
res1: String =     a      b    
scala>

Why the second version doesn't work? Beats me. (But I'm sure the answer will turn out to be obvious.)

Update: See the two anonymous comments below: one answering my question above, the other one suggesting a neater way of handling it. Thanks.

Thursday, 4 February 2010

Scala: Getting into performance trouble, calling head and tail on an ArrayBuffer

Update: The performance problem described below will be remedied in the final release of Scala 2.8. See martin's comment.

====================================

Recently, I wrote the following two different versions for doing the same thing (compute frequencies):

// Version 1  --- Don't do this, lousy performance
// Scala 2.8
def freq[T](seq: Seq[T]): Map[T, Int] = { 
 import annotation._
 @tailrec
 def freq(seq: Seq[T], map: Map[T, Int]): Map[T, Int] = {
   seq match {
     case s if s.isEmpty => map
     case s => {
       val elem = s.head
       val n = map.getOrElse(elem, 0) + 1
       freq(s.tail, map + (elem -> n ))
     }
   }
 }
 freq(seq, Map())
}

// Version 2 --- 260 times faster than Version 1 on some input
def freq[T](seq: Seq[T]): Map[T, Int] = {
 val freqs = collection.mutable.HashMap[T, Int]()
 for(elem <- seq) {         
   val n = freqs.getOrElseUpdate(elem, 0)         
   freqs.update(elem, n + 1)     
 }     
 // Return immutable copy of freqs     
 Map() ++ freqs
}

When comparing the two versions, it turned out that for some input, Version 1 was about 260 times slower (after JVM warm-up). The performance difference surfaced when both versions were called with the following different inputs:

val linesList = io.Source.fromPath("testfile.txt").getLines().toList
val linesSeq = io.Source.fromPath("testfile.txt").getLines().toSeq

Version 1 called with linesSeq as input, performes horrlibly compared to when called with linesList. On my own, I couldn't figure out why, but helpful and knowledgeable people at #scala solved my problem in a few seconds. The explanation appears to be that 1) The default implementation of Seq is an ArrayBuffer, and 2) Calling head and tail on an ArrayBuffer is costly. The same operations are cheap on a List. That's why Version 1 above is a performance trap.

A possible way of getting better performance, is to change the inner, two argument, freq method to use List, instead of Seq:

// Version 1.b  --- Somewhat better
// Scala 2.8
def freq[T](seq: Seq[T]): Map[T, Int] = { 
 import annotation._
 @tailrec
 def freq(seq: List[T], map: Map[T, Int]): Map[T, Int] = {
   seq match {
     case s if s.isEmpty => map
     case s => {
       val elem = s.head
       val n = map.getOrElse(elem, 0) + 1
       freq(s.tail, map + (elem -> n ))
     }
   }
 }
 freq(seq.toList, Map())
}

Better yet --- in Scala 2.8 --- is to scrap the entire method, and call groupBy(identity).mapValues(_.length) directly on the Seq...

Sunday, 22 November 2009

Beware! scala.swing.TextField proclaims EditDone when it isn't

Update: Forget about EditDone. See Update below!

scala.swing.TextField is a basic GUI component that can be used for
letting the user input a line of text. When listening to this component, one can react to an EditDone event:

// Inside some GUI component ...
val textField = new TextField(20)
contents += textField
listenTo(textField)

reactions += {case EditDone(`textField`) =>
  println("Ok, searching DB for input "+ textField.text)
}
//...

Fine. Whenever the user (me) hits the Enter key, the message, "Ok,
searching for DB input ...", simulating a database search, is printed.

However, what happens when some unrelated software product suddenly
pops up a window while the user (me) is still inputting text into the
TextField? I tell you what: The evil, non-sentient contraption prints
the simulated search message --- just as if I had hit Enter.

When the TextField loses focus, it emits an EditDone event. But I'm
not done editing. I've only typed "a". I was about to type
"abecedarian". Now the silly thing will search the database for all
words containing the letter "a". I never told it to do that. This
happened just because some other, unrelated, ill-behaving program
grabbed the focus.

Of course, the focus may also be lost because the user voluntarily
changes windows (for instance, in order to Google for "abecedarian").

As far as I can tell, there is no sane way to tell an EditDone event
produced by the user (me) hitting Enter from an EditDone event
produced because the TextField component lost focus. This cannot be
right.

(A while ago, I asked about this on the Scala-user list. Not one single
answer from one single soul in the entire Universe. It feels lonely.)

(I'm using Scala 2.8.)

Update: Forget about EditDone.

What you should do, is not to listen to the TextField, but to TextField.keys. This way, you'll be able to catch a KeyPressed event, and check if the key pressed was Enter. Simple.

It's a bit tricky to figure out, however, since it's not in the TextField Scala docs (you'll have to find your way to scala.swing.Component). This is how it could look:

import swing._
import event._

//...

// Inside some GUI component ...
val textField = new TextField(20)
contents += textField

listenTo(textField.keys)

import Key._
reactions += {case KeyPressed(`textField`, Enter, _, _) =>
  println("Ok, searching DB for input "+ textField.text)
}
//...

Thanks to Ingo Maier for explaining this.

Wednesday, 29 July 2009

Scala case classes don't have auxiliary constructors?

The lesson of today, is that Scala case classes don't appear to have auxiliary constructors.

In Scala, auxiliary constructors may be added to a class by defining a "this" method:


scala> class AClass(s1: String, s2: String) {
  def this(s: String) = this(s, "default")
}
defined class AClass

scala> new AClass("hey")
res0: AClass = AClass@187b5ff

Look what happens when you try the same trick on a case class:


scala> case class ACaseClass(s1: String, s2: String) {
  def this(s: String) = this(s, "default")
}
defined class ACaseClass

scala> ACaseClass("hey")
:7: error: wrong number of arguments for method apply: (String,String)ACaseClass in object ACaseClass
ACaseClass("hey")
^

The attempt at adding an auxiliary constructor compiles, but results in a runtime error.

Update: Oops, yes the can have auxiliary constructors --- see comment below, by jkriesten, straightening things out!

Update: Paul (see comment below) points to the following discussion on this topic http://www.scala-lang.org/node/976.

Tuesday, 9 June 2009

Printing the Unicode code points of UTF8 characters (Scala)

Sometimes it is useful to be able to print the Unicode code point of a UTF8 character. (For instance, when you need to check if you mistakenly use a similar looking character instead of the one you're supposed to use.)

Using Scala's RichString's format method, you can create a string of a zero padded, four digit, hexadecimal Unicode number, for example of the 'ä' character, like this:

scala> "%04X".format('ä'.toInt)
res0: String = 00E4

scala>

Here's a related example, printing a tab separated list of some IPA (phonetic) characters and their Unicode code points in a format suitable for using in Scala/Java strings:

scala> "ɸβfvθðszʃʒʂʐçʝxɣχʁħʕʜ"\
.map(c => "%s\t\\u%04X".format(c, c.toInt))\
.foreach(println)
ɸ \u0278
β \u03B2
f \u0066
v \u0076
θ \u03B8
ð \u00F0
s \u0073
z \u007A
ʃ \u0283
ʒ \u0292
ʂ \u0282
ʐ \u0290
ç \u00E7
ʝ \u029D
x \u0078
ɣ \u0263
χ \u03C7
ʁ \u0281
ħ \u0127
ʕ \u0295
ʜ \u029C

scala>

(The line terminating backslashes in the Scala code are added to indicate the fact that the above is a one-liner that doesn't fit the page. Remove these and the newlines if you want to run the code in the Scala shell.)

Knowing the codepoints can be useful, e.g. when you don't want to or can't input non-ASCII characters into your code:

scala> var v = "\u0278"
v: java.lang.String = ɸ

scala>

In Java, it looks similar, but you have to cast your chars to ints:

String.format("%04X", (int) 'ä'), etc.

Sunday, 8 March 2009

Scala: Reversing a string by up- and then downcasing it

Did you know that you can reverse a string by merely upcasing it and then downcasing it again? Here's an example:

scala> val s = "ςσσ"
s: java.lang.String = ςσσ

scala> s.toUpperCase.toLowerCase == s.reverse.toString
res0: Boolean = true

scala>

If you don't believe me, just copy and paste the two lines of code above into the Scala interpreter, and see it for yourself.

Thursday, 18 December 2008

Scala for small throw-away scripting tasks

I've come to use Scala for tiny scripts to be thrown away after doing some small task. Typically this involves processing a few files, comparing some textual data, maybe extracting some fields of tab-separated files, etc. The kind of things that Perl used to be the obvious choice for.

Although lacking Perl's simplified syntax for iterating over all lines in files, Scala works quite nicely for small tasks.

For example, today I had to extract from a file all lines of four or more characters including only upper-case characters, and capitalize the output:

scala.io.Source.fromFile(args(0))
.getLines.map(_.stripLineEnd).filter(_.matches("[A-Z]{4,}"))
.map(_.toLowerCase.capitalize).foreach(println)

Not exactly a thing of beauty, but it only took a minute and it works. And it reminds me a bit of a classic Unix command line pipeline.

A few things on my wish-list to make Scala even better for small scripts:

A nicer way of setting the output character encoding (currently you have to do something like Console.setOut(new java.io.PrintStream(Console.out,true,"UTF8")))
It would be great if Source.getLines could remove the new line character of each line
A better name for RichString.stripLineEnd (for some reason, it is totally impossible for me to remember the name of this method)
Maybe scripting support in the Scala Netbeans plugin? (Currently, I think the plugin wants you to put your code in a class/object)

Tuesday, 9 December 2008

Scala: Beware of inadvertently shadowing variables

I've just spent 15 minutes looking for a stupid mistake in some Scala code. The problem was that I had shadowed a variable.

In some situations in Scala, you are allowed to shadow variables. In other words, it is sometimes legal to give a new variable the same name as an existing one. This can lead to mistakes. The following legal code illustrates how you can shadow a method input variable:

def theShadow(list :Array[String]) : Seq[String] = {
  // Mistake! Inadvertently
  // shadowing the input parameter:
  val list = List("Asa", "nisi", "masa")
  list
 }

(The above is a very obvious example. When you make this mistake in real code, it will probably be in a less obvious context.)

Scala: XML serializer adds closing elements to empty elements

When printing Scala XML nodes/elements, closing tags for empty elements are added, even if there weren't any in the input.

For example, if you input <childless/>, the XML processor will add a closing tag like this:

scala> val elem = <childless/>
elem: scala.xml.Elem = <childless></childless>

(The two versions of the XML element are equivalent, but sometimes it is practical to be able to do a simple string comparison of the input and output XML files. The added closing tags may make this harder.)

See this thread.

Friday, 5 December 2008

Scala: Problems using the XML API

I've encountered some problems using the Scala XML API. The first one had to do with scala.xml.XML.loadFile throwing away comment nodes of the input XML file.

A helpful person on the scala-user list suggested instead using scala.xml.parsing.ConstructingParser.fromFile. This worked nicely, keeping the comment elements of the input file intact. However, when processing larger XML files, this approach did not work well, resulting in out of memory exceptions.

Finally, I got yet a helpful answer on the scala-user list, this time in the form of some code, translating Java XML nodes into the Scala equivalents.

If you get into the same trouble as I did, you may want to take a look at this code snippet posted on the scala-user list by David Pollak. (You might have to change the code a bit to suit your needs, though.)

Yet a problem I've encountered: you might be hit by a performance problem when extracting child nodes of a large Elem using the \\ or \ operators. (The fix seems to be to loop over the child nodes instead.)

Summary: The current Scala XML API may not work flawlessly if you both want to process rather large documents and at the same time keep all the information of the original input XML file... but it works fine if you write your own XML file reader (see link above) and are careful with the use of \\ or \ on large Elems.

Here's an earlier post on Scala XML processing.

Wednesday, 19 November 2008

Scala: New Netbeans 6.5 plugin

There is a new version of a Netbeans 6.5 plugin for Scala programming.

The Scala plugin already seems quite useful, and it's getting better and better for each new version.

Check it out here. This is a link to the blog of the author of the plugin.

By the way, Netbeans 6.5 was just released too.

Tuesday, 18 November 2008

Scala: The Map += method expects a Tuple: += ((k, v))

In Scala, you use the += method to add a key-value pair to a Map. The key-value pair should be in the form of a Tuple, or a Pair. You can use different syntax for such pairs: ("year", 2008), "year" -> 2008, Tuple2("year", 2008) or Pair("year", 2008):

scala> ("year",2008) == "year" -> 2008
res0: Boolean = true

scala> "year" -> 2008 == Pair("year", 2008)
res1: Boolean = true

scala> Pair("year", 2008) == Tuple2("year", 2008)
res2: Boolean = true

Thus, a few different but equal ways of adding a key-value pair to a Map:


scala> val map = new scala.collection.mutable.HashMap[String,Int]
map: scala.collection.mutable.HashMap[String,Int] = Map()

scala> map += (("year",2008))  //Notice the parentheses
scala> map += ("year" -> 2008)
scala> map += Pair("year",2008)
scala> map += Tuple2("year", 2008)

However, this one fails, because of missing parentheses:

scala> map += ("year",2008)
:6: error: type mismatch;
found   : java.lang.String("year")
required: (String, Int)
map+=("year",2008)
^

You can check out, e.g., this and this thread on the Scala mailing list.

Monday, 10 November 2008

Scala: Converting Java collections into their Scala counterparts

In the scala.collection.jcl library, you'll find Scala wrappers, adding Scala methods to Java collections. This means that a Java collection (e.g., an ArrayList) will be converted to work as a Scala collection, making it possible to call foreach on a ArrayList, etc:

import scala.collection.jcl.Conversions._

val a = new java.util.ArrayList[String]
a.add("Asa")
a.add("nisi")
a.add("masa")

// foreach now works on a Java List:
a.foreach(println)

Simlarily, you can now call .mkString on a Java list:

// Let's use mkString to print the
// ListArray contents as a Prolog spell/3 fact:

println(a.mkString("spell('", "', '", "')."))

// -> spell('Asa', 'nisi', 'masa').

See this Scala mailing list thread.

Scala: You cannot run a companion object as a stand-alone program

Update: In Scala 2.8, the below is no longer true. A companion object can now work as the entry point of an application.

===============================

In the Scala programming language, a companion object is an object with the same name as a class in the same source file. (Scala's companion objects can be used similar to Java's static methods.)

An object definition on its own can function as the entry point for running a Scala program. Compiling and running this object works fine:

object heyYouTheRocksteadyCrew{
    def main(args :Array[String]) {
     println("Make a break!")
 }
}

However, if you try to run the same object when it is a companion object to a class with the same name, this will result in an exception:

class heyYouTheRocksteadyCrew{}

object heyYouTheRocksteadyCrew{
 def main(args :Array[String]) {
     println("Make a move!")
 }
}

java.lang.NoSuchMethodException:
heyYouTheRocksteadyCrew.main([Ljava.lang.String;)

The above is true of the current release, 2.7.2.final. (Until this is fixed, these guys will not be too happy about any stupid heyYouTheRocksteadyCrew-exception...!)

There is at least one thread about the above on the Scala mailing list.

Friday, 5 September 2008

Scala: String vs RichString oddities

Update: In Scala 2.8, the below is no longer true. String.reverse now returns a String rather than a RichString:


scala> "a".reverse == "a"
res0: Boolean = true

=========================================

In the Scala programming language, there is a class called RichString, that adds features to the underlying Java String. In the current version of Scala (2.7.2.final), this leads to some odd behaviour:

"Im a string" == "Im a string".reverse.reverse

returns false, while

"Im a string" == "Im a string".reverse.reverse.toString

returns true!

Just to make your head spin, the following code does indeed work as expected:

val str :String = "Im a string".reverse.reverse
println(str == "Im a string") // prints "true"

while

val str = "Im a string".reverse.reverse
println(str == "Im a string") // prints "false"

does not.

The explanation is that String.reverse returns a RichString, and that == returns false when comparing a String and a RichString, even though it is the "same" string (as in the example above).

If I understand it correctly, this oddity will be fixed in future releases of Scala.

(And no, Scala's == is not the same as Java's ditto. It means "equal objects" rather than "refers to the same instance of an object".)

Scala mailing list item here.

Saturday, 30 August 2008

Scala and implicit conversion: Turning a string into pure Weirdness

In the Scala programming language, you can turn water into wine, or vice versa, using implicit conversion.

Imagine that you have a class called Weird:


class Weird(s :String) {
  def imWeird :String = {
    "I'm "+ s +" and I'm weird!"
  }
}

It consists of merely a string, s, and a method, imWeird, that returns a jolly message containing the very same string. (Thus, the code

val freak = new Weird("a freak")
println(freak.imWeird)

outputs I'm a freak and I'm weird!.)

Now, Scala allows you to create an implicit conversion that adds the method(s) of Weird to any other class. Or rather, turns an object into a Weird whenever one calls Weird's methods (functions) on the given object.

For example, the following implicit conversion

  implicit def string2Weird(s: String) = new Weird(s)

makes it possible to call Weird's method(s) on a String. This code

val happy = "Happy"
println(happy.imWeird)

will now output

  I'm Happpy and I'm weird!

The name of the implicit conversion method, string2Weird, is arbitrary.

Friday, 16 May 2008

Scala one-liner for upcasing lines of text

The following is a Scala script that up-cases each line of an UTF8 encoded input file (args(0)) and prints the result to standard output:

import scala.io.Source

Console.setOut(new java.io.PrintStream(Console.out,true,"UTF8"))

Source.fromFile(args(0), "UTF8").getLines.foreach(line => print(line.toUpperCase))

If you're trusting the default character encoding to work for you, you may reduce it to:


import scala.io.Source

Source.fromFile(args(0)).getLines.foreach(line => print(line.toUpperCase))

Another way to do it, is to read the lines into an iterator, using the iterator's .map method to upcase each line:


import scala.io.Source

val lines = Source.fromFile(args(0)).getLines.map(_.toUpperCase)

lines.foreach(print)

A Java programmer may be relieved (or horrified) to learn that Scala does not have any checked exceptions. There are only runtime exceptions, and you don't need to add any try/catch statements if you don't want to.

When you run a Scala script, you can instruct the Scala interpreter to compile the script, and use the compiled version (a jar file) if it's younger than the source-file. This gives better performance (shorter start-up, etc). You use the savecompiled command line argument.