Frequently, I process text files containing tab separated data. Sometimes these have empty columns, i.e., two or more tabs without any data between them. More often than not, I want to keep the empty fields. However, Java's String.split
defaults to removing empty fields.
This is what you do to keep the empty fields:
String[] fields = string.split("\t", -1)
In the following example, the test string
tst
will be split into zero parts (result1
) and four parts (result2
) respectively: String tst = "\t\t\t";
String[] result1 = tst.split("\t"); //result1.length == 0
String[] result2 = tst.split("\t", -1); //result2.length == 4
result2
will contain four instances of the empty string (""
).The same thing goes when you split a string using a pre-compiled regular expression:
Pattern pattern = Pattern.compile("\t");
String[] result3 = pattern.split(tst); //result3.length == 0
String[] result4 = pattern.split(tst, -1); //result4.length == 4
By the way, I compared the performance of the two variants above (String's split and a pre-compiled pattern matching a tab). Luckily, the difference in performance was negligible, the compiled pattern winning with a small margin. When the split pattern is more complicated, I would expect bigger performance differences between compiled and uncompiled regular expressions. (Running Sun's
java
command with and without the server
argument made a big difference, however. The default client
was significantly slower.)
3 comments:
Thank you very much!
Thank you! I was going crazy trying to make sense of the Javadoc on this, but you answered the question I actually care about.
big thx, this is what i needed!
Post a Comment