val avgHeight: Option[Float] = from(people)(p => compute(avg(p.heightInCentimeters)))
Thanks a lot for the great answers!
val avgHeight: Option[Float] = from(people)(p => compute(avg(p.heightInCentimeters)))
Sometimes, one runs into UTF-8 strings with characters from different code blocks. This is problematic in cases where the fonts look the same, but the characters are different. The Scala REPL is handy for finding out what Unicode block each character in a string belongs to. Let's use "ЕКАТEРИНБУРГ" and "ЕКАТЕРИНБУРГ" as examples:
scala> "ЕКАТEРИНБУРГ" == "ЕКАТЕРИНБУРГ"The REPL exposed one of the seemingly identical strings to be an unhealthy mix of Latin and Cyrillic characters. Thanks, REPL.
res0: Boolean = false
scala> import java.lang.Character.UnicodeBlock
import java.lang.Character.UnicodeBlock
scala> "ЕКАТEРИНБУРГ".foreach(c => println(c +"\t"+ UnicodeBlock.of(c)))
Е CYRILLIC
К CYRILLIC
А CYRILLIC
Т CYRILLIC
E BASIC_LATIN
Р CYRILLIC
И CYRILLIC
Н CYRILLIC
Б CYRILLIC
У CYRILLIC
Р CYRILLIC
Г CYRILLIC
scala> "ЕКАТЕРИНБУРГ".foreach(c => println(c +"\t"+ UnicodeBlock.of(c)))
Е CYRILLIC
К CYRILLIC
А CYRILLIC
Т CYRILLIC
Е CYRILLIC
Р CYRILLIC
И CYRILLIC
Н CYRILLIC
Б CYRILLIC
У CYRILLIC
Р CYRILLIC
Г CYRILLIC
scala>
We needed some cleaning up of user input entered into a text field. We ended up with a Scala case class that cleans up its constructor string argument a bit, by removing multiple whitespace characters and trimming it. It behaves like this:
scala> Text(" a a ") == Text("a a")
res0: Boolean = true
scala> Text(" a a ").text == Text("a a").text
res1: Boolean = true
scala> Text(" a a ").text
res2: java.lang.String = a a
case class Text(private var _text: String) {Since the input string,
val text = _text.trim.replaceAll(" +", " ")
_text = text
}
var _text
, is private, we can manipulate it a bit, without making it possible for others to tamper with. I'm not sure if this is the obvious way to do it, but it seems to work as intended.// Doesn't workThis version does not work since
case class BrokenText(private var _text: String) {
_text = _text.trim.replaceAll(" +", " ")
val text = _text
}
Text.text
will return the original string, not the cleaned up one:scala> BrokenText(" a b ")Why the second version doesn't work? Beats me. (But I'm sure the answer will turn out to be obvious.)
res0: BrokenText = BrokenText(a b)
scala> res0.text
res1: String = a b
scala>
Update: The performance problem described below will be remedied in the final release of Scala 2.8. See martin's comment.
====================================
Recently, I wrote the following two different versions for doing the same thing (compute frequencies):
// Version 1 --- Don't do this, lousy performance
// Scala 2.8
def freq[T](seq: Seq[T]): Map[T, Int] = {
import annotation._
@tailrec
def freq(seq: Seq[T], map: Map[T, Int]): Map[T, Int] = {
seq match {
case s if s.isEmpty => map
case s => {
val elem = s.head
val n = map.getOrElse(elem, 0) + 1
freq(s.tail, map + (elem -> n ))
}
}
}
freq(seq, Map())
}
// Version 2 --- 260 times faster than Version 1 on some input
def freq[T](seq: Seq[T]): Map[T, Int] = {
val freqs = collection.mutable.HashMap[T, Int]()
for(elem <- seq) {
val n = freqs.getOrElseUpdate(elem, 0)
freqs.update(elem, n + 1)
}
// Return immutable copy of freqs
Map() ++ freqs
}
val linesList = io.Source.fromPath("testfile.txt").getLines().toList
val linesSeq = io.Source.fromPath("testfile.txt").getLines().toSeq
linesSeq
as input, performes horrlibly compared to when called with linesList
. On my own, I couldn't figure out why, but helpful and knowledgeable people at #scala solved my problem in a few seconds. The explanation appears to be that 1) The default implementation of Seq
is an ArrayBuffer
, and 2) Calling head and tail on an ArrayBuffer
is costly. The same operations are cheap on a List
. That's why Version 1 above is a performance trap.freq
method to use List
, instead of Seq
:// Version 1.b --- Somewhat better
// Scala 2.8
def freq[T](seq: Seq[T]): Map[T, Int] = {
import annotation._
@tailrec
def freq(seq: List[T], map: Map[T, Int]): Map[T, Int] = {
seq match {
case s if s.isEmpty => map
case s => {
val elem = s.head
val n = map.getOrElse(elem, 0) + 1
freq(s.tail, map + (elem -> n ))
}
}
}
freq(seq.toList, Map())
}
groupBy(identity).mapValues(_.length)
directly on the Seq
...
I often need to count the frequencies of strings ("words", typically). Below are a few Scala snippets for counting strings and things. (Don't miss the last one.)
First try
Let's start with a method for counting string frequencies in a list:
// Scala 2.8
def freq(wds: List[String]): Map[String, Int] = {
import annotation._
@tailrec
def freq(wds: List[String], map: Map[String, Int]): Map[String, Int] = {
wds match {
case l if l.isEmpty => map
case l => {
val elem = l.head
val n = map.getOrElse(elem, 0) + 1
freq( l.tail, map + (elem -> n ) )
}
}
}
freq(wds, Map())
}
freq(wds, Map()
).import annotation._part tells the compiler to check whether it can optimize the tail recursive call or not. (The Scala compiler can optimize a special case of tail recursion.)
@tailrec
HashMap
instead:def freq(wds: List[String]): Map[String, Int] = {
val freqs = collection.mutable.HashMap[String, Int]()
for(w <- wds) {
val n = freqs.getOrElseUpdate(w, 0)
freqs.update(w, n + 1)
}
// Return immutable copy of freqs
Map() ++ freqs
}
List
input. There is a more general concept, Seq
, that will make it possible to call freq
with different kinds of sequences (lists, listbuffers, arrays):// Scala 2.8
def freq(wds: Seq[String]): Map[String, Int] = {
import annotation._
@tailrec
def freq(wds: Seq[String], map: Map[String, Int]): Map[String, Int] = {
wds match {
case l if l.isEmpty => map
case l => {
val elem = l.head
val n = map.getOrElse(elem, 0) + 1
freq( l.tail, map + (elem -> n ) )
}
}
}
freq(wds, Map())
}
Chars
, not Strings
. The code above will not help you count character frequencies. Here is an attempt at generalising the code further, to make it able to count the frequencies of any thing, T
, not just String
:// Scala 2.8
def freq[T](seq: Seq[T]): Map[T, Int] = {
import annotation._
@tailrec
def freq(seq: Seq[T], map: Map[T, Int]): Map[T, Int] = {
seq match {
case s if s.isEmpty => map
case s => {
val elem = s.head
val n = map.getOrElse(elem, 0) + 1
freq(s.tail, map + (elem -> n ))
}
}
}
freq(seq, Map())
}
def freq[T](seq: Seq[T]): Map[T, Int] = {
val freqs = collection.mutable.HashMap[T, Int]()
for(elem <- seq) {
val n = freqs.getOrElseUpdate(elem, 0)
freqs.update(elem, n + 1)
}
// Return immutable copy of freqs
Map() ++ freqs
}
def freq[T](seq: Seq[T]) = seq.groupBy(x => x).mapValues(_.length)It's so short, that it's almost not worth defining a method/function for it. You can simply call
.groupBy(x => x).mapValues(_.length)
directly on your Seq
. (Or groupBy(identity).mapValues(_.length)
, which is the same thing.)