Friday 5 September 2008

Scala: String vs RichString oddities

Update: In Scala 2.8, the below is no longer true. String.reverse now returns a String rather than a RichString:


scala> "a".reverse == "a"
res0: Boolean = true

=========================================

In the Scala programming language, there is a class called RichString, that adds features to the underlying Java String. In the current version of Scala (2.7.2.final), this leads to some odd behaviour:
"Im a string" == "Im a string".reverse.reverse
returns false, while
"Im a string" == "Im a string".reverse.reverse.toString
returns true!

Just to make your head spin, the following code does indeed work as expected:
val str :String = "Im a string".reverse.reverse
println(str == "Im a string") // prints "true"
while
val str = "Im a string".reverse.reverse
println(str == "Im a string") // prints "false"
does not.



The explanation is that String.reverse returns a RichString, and that == returns false when comparing a String and a RichString, even though it is the "same" string (as in the example above).

If I understand it correctly, this oddity will be fixed in future releases of Scala.

(And no, Scala's == is not the same as Java's ditto. It means "equal objects" rather than "refers to the same instance of an object".)

Scala mailing list item here.

Case insensitive pattern matching of Unicode strings in Java

To make case insensitive pattern matching of Unicode strings in Java, you can call Pattern.compile with a second argument, like this:

Pattern p = 
Pattern.compile(patternString, Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CASE);


(This is useful when dealing with non-ASCII/non-Latin1 text, such as Cyrillic. However, it may not work flawlessly for the Turkish Unicode characters.)

Update: I just learned that there is a nicer way of doing this: start the patternString above with "(?iu)":
Pattern p = 
Pattern.compile("(?iu)"+ patternString);