Friday, 15 February 2008

Reading/writing non-default character encoded data in Java

When in an environment where the default (system) character encoding differs from the desired character encoding of the output data, you can use System.setOut and System.setErr. For reading data of a different character encoding than the default encoding, you can tell e.g. the Scanner class what character encoding to expect.

The following could be used for reading and writing UTF8 data on a system where the default character encoding may be different from UTF8:


System.setOut(new PrintStream(System.out,true,"UTF8"));
System.setErr(new PrintStream(System.err,true,"UTF8"));

Scanner scanner = new Scanner(new File(fileName), "UTF8");

while(scanner.hasNextLine())
{
// Read input lines,
String line = scanner.nextLine();
line = doSomething(line);
// Write some output to STDOUT/STDERR
System.out.println(line);
...
}


The boolean flag of the second constructor argument of PrintStream activates autoflush, but one does not need to use this argument.