With the help of Java's Character class, one can identify to what code block a unicode character belongs. This may be useful when, for example, validating a string in order to find peculiar mixtures of character code blocks (see an example in a previous post).
The following code
Character.UnicodeBlock ub = null;outputs
ub = Character.UnicodeBlock.of('\u042F');
System.out.println(ub);
ub = Character.UnicodeBlock.of('۲');
System.out.println(ub);
CYRILLIC
ARABIC
This is a method returning all code blocks for the characters of a string:
Set<UnicodeBlock> getUnicodeCodeBlocks(final String s)
{
Set<UnicodeBlock> result = new HashSet<UnicodeBlock>();
for(char c : s.toCharArray())
{
result.add(Character.UnicodeBlock.of(c));
}
return result;
}
No comments:
Post a Comment