Nikoloogle Lindbloogle: Identifying Unicode code blocks in Java

Thursday, 22 November 2007

Identifying Unicode code blocks in Java

With the help of Java's Character class, one can identify to what code block a unicode character belongs. This may be useful when, for example, validating a string in order to find peculiar mixtures of character code blocks (see an example in a previous post).

The following code

Character.UnicodeBlock ub = null;

ub = Character.UnicodeBlock.of('\u042F');
System.out.println(ub);

ub = Character.UnicodeBlock.of('۲');
System.out.println(ub);

outputs

CYRILLIC
ARABIC

This is a method returning all code blocks for the characters of a string:


Set<UnicodeBlock> getUnicodeCodeBlocks(final String s)
{
      Set<UnicodeBlock> result = new HashSet<UnicodeBlock>();
      for(char c : s.toCharArray())
      {
        result.add(Character.UnicodeBlock.of(c));
      }
      return result;
}

Nikoloogle Lindbloogle

Thursday, 22 November 2007

Identifying Unicode code blocks in Java

No comments:

Blog Archive

About Me