What is the difference between java.util.Scanner.useDelimiter()
and Scanner.skip()
? For example, I have these strings formatted as shown:
String line1 = "0---20.000:\t\t \t12%";
String line2 = "0--20.000:\t 12%";
String line3 = "0-20.000: \t \t12%";
String error = "0-: \t\t12%";
And I need this output:
0
20.000
12
I should use a Scanner and a pattern valid for all the three String and I need to control that the tokens are three, otherwise throw an exception.
Could I get this output with both Scanner methods?
And which regex pattern should I use?
It has to be valid also with other numbers.
EDIT: That's my try:
package scanners;
import java.util.Scanner;
public class ScannerTry {
public static void main(String[] args) {
String line = "0--20.000: 12%";
Scanner scan = new Scanner(line);
scan.useDelimiter("[-*:*\t*%]*");
while (scan.hasNext()){
System.out.println(scan.next());
}
scan.close();
}
}
But the output is:
0
2
0
.
0
0
0
1
2
Here's what you've specified as a delimiter:
scan.useDelimiter("[-*:*\t*%]*");
The square brackets contain a list of characters, and using them means "match a character that is in this list". The *
outside the square brackets means "match 0 or more occurrences of one of these characters."
The reason you're getting one character at a time is that when you match 0 or more occurrences, that means that an empty string (string of length 0) matches the delimiter pattern. Since every 2 characters in the input file has an empty string between them (there are no characters between them, so an empty string matches), the scanner will consider each character to be its own token. So the first thing you want to do is to change that last *
to +
, which means "match 1 or more occurrences". Now an empty string won't match.
The second problem with your pattern is that *
inside square brackets just means that an asterisk is one of the characters you match; the meaning of "0 or more" does not apply inside square brackets. In fact, whenever you have square brackets, no matter what is inside them, this pattern always matches exactly one character. So any *
, +
, or anything else that you want to specify as repeating needs to be outside square brackets.
If you just take out the *
:
scan.useDelimiter("[-:\t%]+");
Now this will match any sequence of -
, :
, tab, and %
characters. It won't match a space, though, and I see spaces in some of your examples. So you may want to add a space inside the square brackets. Or you could say this:
scan.useDelimiter("[-:\\s%]+");
since a \s
combination inside square brackets means match "any whitespace character", which includes space, tab, and a few others like newlines. (But only do this if you really do want to match the newlines.)
One other thing: you were right to put -
first inside the square brackets. If you don't, it may have a different meaning:
"[a-z]"
matches any character from a
to z
, and it doesn't match hyphen. However:
"[a\\-z]"
matches a
, z
, or hyphen. Some programmers (including me), when we want a hyphen to be in the character set, would use this backslash on the hyphen even when it isn't necessary, to avoid any possible confusion:
scan.useDelimiter("[\\-:\t%]+");