To make case insensitive pattern matching of Unicode strings in Java, you can call Pattern.compile
with a second argument, like this:
Pattern p =
Pattern.compile(patternString, Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CASE);
(This is useful when dealing with non-ASCII/non-Latin1 text, such as Cyrillic. However, it may not work flawlessly for the Turkish Unicode characters.)
Update: I just learned that there is a nicer way of doing this: start the patternString above with "(?iu)":
Pattern p =
Pattern.compile("(?iu)"+ patternString);
2 comments:
Hi,
I use such code:
Pattern p = Pattern.compile("(?iu)^политика.*");
Matcher m = p.matcher("Политика");
And matches method returns false. Why?
Does not matter if I use "Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CASE" or "(?iu)". If I use "(?iu)^Политика.*" or "(?iu)^(П|п)олитика.*" it returns true.
Why?
Hi there Anonymous,
when I try your code, m.matches() returns true, as expected...
Post a Comment