Search code examples
javaregexjava-8metacharacters

Java Regex Metacharacters returning extra space while spliting


I want to split string using regex instead of StringTokenizer. I am using String.split(regex); Regex contains meta characters and when i am using \[ it is returning extra space in returning array.

import java.util.Scanner;
public class Solution{
    public static void main(String[] args) {
        Scanner i= new Scanner(System.in);
        String s= i.nextLine();
        String[] st=s.split("[!\\[,?\\._'@\\+\\]\\s\\\\]+");
        System.out.println(st.length);
        for(String z:st)
            System.out.println(z);
        }
}

When i enter input [a\m] It returns array length as 3 and

 a m  

Space is also there before a. Can anyone please explain why this is happening and how can i correct it. I don't want extra space in resulting array.


Solution

  • Since the [ is at the beginning of the string, when split removes [, there appear two elements after the first split step: the empty string that is at the beginning of the string, and the rest of the string. String#split does not return trailing empty elements only (as it is executed with limit=0 by default).

    Remove the characters you split against from the start (using a .replaceAll("^[!\\[,?._'@+\\]\\s\\\\]+", note the ^ at the beginning of the pattern). Here is a sample code you can leverage:

    String[] st="[a\\m]".replaceAll("^[!\\[,?._'@+\\]\\s\\\\]+", "")
                     .split("[!\\[,?._'@+\\]\\s\\\\]+");
    System.out.println(st.length);
    for(String z:st) {
        System.out.println(z);
    }
    

    See demo