Wednesday, 21 May 2008

Saldo 1.0: Large, freely available Swedish morphologic and semantic lexicon

Språkbanken has published a large, freely available Swedish lexicon, Saldo, "a Swedish basic language resource". The release appears to include some 68,000 uninflected lemma forms as well as more than 740,000 expanded (full) word forms. There is morphologic and semantic information.

This resource should be valuable for part-of-speech tagging, lemmatizers, spell-checking, (semantic) analysis of Swedish text, etc, etc.

The release includes software for interfacing with the lexicon. (Those of you into functional programming might be interested in the fact that the lexicon software is written in Haskell.)

It is released under the LGPL license.

