Search code examples
javaregextokendelimiter

Java Regex Tokenizing


new to regex here haha.

Let's say I have a string:

String toMatch = "TargetCompID=NFSC_AMD_Q\n" +
        
            "\n## Bin's verifix details";

Which shows up in a .cfg file as:

TargetCompID=NFSC_AMD_Q

## Bin's verifix details

I want to tokenize this into an array as:

{"TargetCompID", "NFSC_AMD_Q", "## Bin's verifix details"}

Current code but doesn't out anything:

static void regexTest(String regex, String toMatch) {
    Pattern patternTest = Pattern.compile(regex);
    Matcher matcherTest = patternTest.matcher(toMatch);
    while (matcherTest.find()) {
        for (int i = 1; i <= matcherTest.groupCount(); i++) {
            System.out.println(matcherTest.group(i));
        }
    }
}

public static void main(String[] args) throws Exception {
    String regex = "^[^=]+.*$" + "|" + "^#+.*$";
    String toMatch = "TargetCompID=NFSC_AMD_Q\n" +
            "\n" +
            "## Bin's verifix details";


    String testRegex = ".*";
    String testToMatch = "   ###  Bin";
    regexTest(regex1, toMatch);
    System.out.println("----------------------------");

// regexTest(testRegex, testToMatch);

EDIT

while (matcherTest.find()) {
    for (int i = 1; i < matcherTest.groupCount(); i++) {
        System.out.println(matcherTest.group(i));
    }

prints:

TargetCompID
NFSC_AMD_Q

but not

## Bin's verifix details

why?

also this code:

while (matcherTest.find()) {
    System.out.println(matcherTest.group());
}

only prints

TargetCompID=NFSC_AMD_Q

## Bin's verifix details

Is TargetCompID and NSFC_AMD_Q not separated because we're not doing group(i)? and why is there a \newline printed?


Solution

  • You can use this regex in Java:

    (?m)^([^=]+)=(.+)\R+^(#.*)
    

    RegEx Demo

    RegEx Breakup:

    • (?m): Enable MULTILINE mode
    • ^([^=]+)=: Match till = and capture in group #1 followed by =
    • (.+): Match rest of line in group #2
    • \R+: Match 1+ line breaks
    • ^(#.*): match a full line starting with # in group #3