Update: There is another, better way to get the name of the Unicode range a rune belongs to than described below:
import ("golang.org/x/text/unicode/runenames")
...
name := runenames.Name('م') //ARABIC LETTER MEEM
...
https://play.golang.org
Below is an alternative way:
If you want to know what part of the Unicode table a character (rune) belongs to in Go, you can use the Scripts map found in the unicode package:
r := 'ن' // The isolated form of Arabic 'n'
for s, t := range unicode.Scripts {
if unicode.In(r, t) {
fmt.Println(s) // Arabic
}
}
https://play.golang.org/
The map unicode.Scripts contains the names of the different parts of the Unicode table, such as Latin, Greek, Arabic, Cyrillic, etc. Each such name is associated with a RangeTable, representing a subset of the Unicode character set. The unicode.In function in the snippet above checks whether a rune r is found in the RangeTable t.
Checking what part of the Unicode table a character belongs to, can be useful for validating that all characters of a string belong to the same script. For example, the Latin and Cyrillic scripts have characters that look identical, but are different characters. Examples are c-с, p-р and a-а. They may look identical, but are represented by different Unicode code points. If you mix Latin and Cyrillic characters in a string, you might for instance not find an expected match in a database search.
c1 := 'c' // Latin
c2 := 'с' // Cyrillic
fmt.Println(c1 == c2) // false
fmt.Printf("%U\n", c1) // U+0063
fmt.Printf("%U\n", c2) // U+0441
https://play.golang.org/
No comments:
Post a Comment