Sunday 11 April 2010

A tiny Scala case class to clean up user input

We needed some cleaning up of user input entered into a text field. We ended up with a Scala case class that cleans up its constructor string argument a bit, by removing multiple whitespace characters and trimming it. It behaves like this:

scala> Text("    a       a      ") == Text("a a")                                                                
res0: Boolean = true
scala> Text(" a a ").text == Text("a a").text
res1: Boolean = true
scala> Text(" a a ").text
res2: java.lang.String = a a

The code looks like this:
case class Text(private var _text: String) {
val text = _text.trim.replaceAll(" +", " ")
_text = text
}
Since the input string, var _text, is private, we can manipulate it a bit, without making it possible for others to tamper with. I'm not sure if this is the obvious way to do it, but it seems to work as intended.

We tried a similar version that did not work:
// Doesn't work
case class BrokenText(private var _text: String) {
_text = _text.trim.replaceAll(" +", " ")
val text = _text
}
This version does not work since Text.text will return the original string, not the cleaned up one:
scala> BrokenText("    a      b      ")
res0: BrokenText = BrokenText(a b)
scala> res0.text
res1: String = a b
scala>
Why the second version doesn't work? Beats me. (But I'm sure the answer will turn out to be obvious.)

Update: See the two anonymous comments below: one answering my question above, the other one suggesting a neater way of handling it. Thanks.

2 comments:

Anonymous said...

The answer to your question (why is second version broken) may be seen when compiling BrokenText with "scala -print"

The constructor shows:
def this(_text: java.lang.String): BrokenText = {
BrokenText.this._text = _text;
BrokenText.super.this();
scala.Product$class./*Product$class*/$init$(BrokenText.this);
BrokenText.this._text_=(_text.trim().replaceAll(" +", " "));
BrokenText.this.text = _text;
()
}

(hope this will show not too badly formatted)

So the answer is: In the constructor body val text is initialised with the constructor parameter '_text', not with the internal field value of the var representation.
(Or other: val text = _text does not use the accessor method of the field)

In my eyes it seems valid to have no such implicit dependency to the order of the initialisation statements.

Anonymous said...

What I did not understand is, why have a new type where a simple conversion function would suffice?

Is it vital to differ a string of type String from a string of type Text?

object Text {
def apply(_text:String) = _text.trim.replaceAll(" +", " ")
}


Text(" a a ") == Text("a a")
res11: Boolean = true

Text(" a a ") == "a a"
res12: Boolean = true