Search code examples
javaregexstringsplittokenize

String regex is not working to split words in closed parenthesis


I am working a string regex to split the below string

String input = "( Customer.browse == \"Car Loan\" ) AND ( Campaign.period BETWEEN 2400 AND 600 ) "
            + "AND ( Customer.eligibity == TRUE ) AND ( Campaign.campaign_name == \"Browse To Start\") "
            + "AND ( Customer.application_started == \"Car Loan\" ) AND ( Time.currenttime BETWEEN 800 AND 2000 ) "
            + "THEN ( Notification.message == SUPPRESS)";

My string tokenizer class is as below

import java.util.ArrayList;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

    public class StringRegexClass {

        public ArrayList<String> stringTokenizer(String str) {

            ArrayList<String> tokenList = new ArrayList<String>();
            Pattern pattern = Pattern.compile("[(\")]|\\w+.\\w+.\\w+|\\w+.\\w+|==");
            Matcher matcher = pattern.matcher(str);
            while (matcher.find()) {
                tokenList.add(matcher.group());
            }
            return (tokenList);
        }
    }

And I am getting output as below when I am passing the string to the above class

enter image description here

I want to split the strings in parenthesis ( Time.currenttime BETWEEN 800 AND 2000 ) and ( Campaign.period BETWEEN 2400 AND 600 ) as below

enter image description here

I tried different ways none of it worked can you please suggest what changes I need to make to my regex expression to make it work


Solution

  • I would recommend you capture the quoted string in full.

    You need to escape the .

    You can use the following regex, but be aware that it will silently skip anything it doesn't recognize:

    [()]|"[^"]*"|\w+(?:\.\w+)?|==
    

    In Java 4+:

    public static List<String> stringTokenizer2(String str) {
        List<String> tokenList = new ArrayList<>();
        Pattern pattern = Pattern.compile("[()]|\"[^\"]*\"|\\w+(?:\\.\\w+)?|==");
        for (Matcher matcher = pattern.matcher(str); matcher.find(); )
            tokenList.add(matcher.group());
        return tokenList;
    }
    

    In Java 9+:

    public static List<String> stringTokenizer(String str) {
        return Pattern.compile("[()]|\"[^\"]*\"|\\w+(?:\\.\\w+)?|==").matcher(str)
                .results().map(MatchResult::group).collect(Collectors.toList());
    }
    

    Test (Java 8)

    String input = "( Customer.browse == \"Car Loan\" ) AND ( Campaign.period BETWEEN 2400 AND 600 ) AND ( Customer.eligibity == TRUE ) AND ( Campaign.campaign_name == \"Browse To Start\") AND ( Customer.application_started == \"Car Loan\" ) AND ( Time.currenttime BETWEEN 800 AND 2000 ) THEN ( Notification.message == SUPPRESS)";
    for (String token : stringTokenizer(input))
        System.out.println(token);
    

    Output

    (
    Customer.browse
    ==
    "Car Loan"
    )
    AND
    (
    Campaign.period
    BETWEEN
    2400
    AND
    600
    )
    AND
    (
    Customer.eligibity
    ==
    TRUE
    )
    AND
    (
    Campaign.campaign_name
    ==
    "Browse To Start"
    )
    AND
    (
    Customer.application_started
    ==
    "Car Loan"
    )
    AND
    (
    Time.currenttime
    BETWEEN
    800
    AND
    2000
    )
    THEN
    (
    Notification.message
    ==
    SUPPRESS
    )