How can I understand the output of the below code? The code's first four print statements are about the Capturing Groups in Regular Expression in Java and the rest of the code is about the Pattern
split
method. I referred a few documents to perceive the code's output (shown in the pic) but could not figured it out how exactly it's working and showing this output.
Java Code
import java.util.*;
import java.util.regex.*;
import java.lang.*;
import java.io.*;
/* Name of the class has to be "Main" only if the class is public. */
public class Codechef
{
public static void main(String[] args) {
//Capturing Group in Regular Expression
System.out.println(Pattern.matches("(\\w\\d)\\1", "a2a2")); //true
System.out.println(Pattern.matches("(\\w\\d)\\1", "a2b2")); //false
System.out.println(Pattern.matches("(AB)(B\\d)\\2\\1", "ABB2B2AB")); //true
System.out.println(Pattern.matches("(AB)(B\\d)\\2\\1", "ABB2B3AB")); //false
// using pattern split method
Pattern pattern = Pattern.compile("\\W");
String[] words = pattern.split("one@two#three:four$five");
System.out.println(words);
for (String s : words) {
System.out.println("Split using Pattern.split(): " + s);
}
}
}
Results
Edit-1
Queries
The first console print lines...
System.out.println(Pattern.matches("(\\w\\d)\\1", "a2a2")); //true
System.out.println(Pattern.matches("(\\w\\d)\\1", "a2b2")); //false
System.out.println(Pattern.matches("(AB)(B\\d)\\2\\1", "ABB2B2AB")); //true
System.out.println(Pattern.matches("(AB)(B\\d)\\2\\1", "ABB2B3AB")); //false
utilizes the matches() method which always returns a boolean (true or false). This method is mostly used for String validation of one sort or another. Taking the first and second example regular expressions which both are: "(\\w\\d)\\1"
and then work that expression against the two supplied strings ("a2a2"
and "a2b2"
) though the matches() method as they have done you will definitely be returned a boolean true and a false in that order.
The real key here is knowing what that particular Regular Expression is suppose to validate. The expression above is only working against 1 Capturing Group which is denoted by the parentheses. The \\w is used for matching any single word character which is equal to a-z or A-Z or 0-9 and _ (the underscore character). The \\d is used for matching a single digit equal to any number from 0 to 9.
Note: In reality the expression Meta characters are written as \w and \d but because the Escape Character (\) in Java Strings need to be escaped you have to add an additional Escape Character.
The \1 is used to see if there is a single match of the same text as most recently matched by the 1st capturing group. Since there is only one capturing group specified you can only use a value of 1 here. Well, that's not entirely true, you could use the value of 0 here but then your not looking for a match in any capturing group which eliminates the purpose here. Any other value greater than 1 would create a expression exception since you have only 1 Capturing Group.
Bottom line, The expression looks at the first two characters within the supplied string:
Basically, the expression is merely used to validate that the Last Two characters within the supplied String match the First Two characters of the same supplied String. This is why the second console print:
System.out.println(Pattern.matches("(\\w\\d)\\1", "a2b2")); //false
returns a boolean false, b2
is not the same as a2
whereas in the first console print:
System.out.println(Pattern.matches("(\\w\\d)\\1", "a2a2")); //true
the Last Two characters a2
do indeed match the First Two characters a2
and therefore boolean true is returned.
You will now notice that in the other two console prints:
System.out.println(Pattern.matches("(AB)(B\\d)\\2\\1", "ABB2B2AB")); //true
System.out.println(Pattern.matches("(AB)(B\\d)\\2\\1", "ABB2B3AB")); //false
the Regular Expression used contains 2 Capture Groups (two sets of parentheses). The same sort of matching applies here but against two capture groups instead of one like the first two console prints.
If you want to see how these Regular Expressions play out and get explanations on what the expressions mean then use Regular Expression Tester at regex101.com. This is also a good Regular Expressions resource.
Pattern.split():
In this case, the use of the Pattern.split() method is a little overkill in my opinion since String.split() accepts Regular Expressions but does have it's purpose in other areas. Never the less it is a good example of how it can be used. The .split() method is used here to carry out the grouping based on the String that was supplied to it and what was deemed as the Regular Expression through Pattern which in this case is "\\W" (otherwise: \W). The \W (uppercase W) means 'match any non-word character which is not equal to a-z or A-Z or 0-9 or _. This expression is basically the opposite of "\w" (with the lowercase w). The characters @, #, :, and $ contained within the supplied String (yes... the comma, semicolon, exclamation, etc):
"one@two#three:four$five"
are considered non-word characters and therefore the split is carried out on any one of them resulting in a String Array containing:
[one, two, three, four, five]
The very same thing can be accomplished doing it this way using the String.split() method since tis method allows for a Regular Expression to be applied:
String[] s = "one@two#three;four$five".split("\\W");
or even:
String[] s = "one@two#three;four$five".split("[@#:$]");
or even:
String[] s = "one@two#three;four$five".split("@|#|:|\\$");
// The $ character is a reserved RegEx symbol and therefore
// needs to be escaped.
or on and on and on...
Yup... "\\W" is easier since it covers all non-word characters. ;)