Search code examples
javaregexgroovytokenize

Groovy String Tokenizer matching on the wrong delimiter


I have a String that has values like this "inactiveCatalog%2CwrongTransaction", I want to bring this into a list breaking on the "%2C"....

String rseFiltersToRemove = "inactiveCatalog%2CwrongTransaction"
ArrayList rseFiltersToRemoveList = rseFiltersToRemove.tokenize("%2C")

I was expecting the list to have 2 elements ("inactiveCatalog" and "wrongTransaction"), but it turns out to have 3 ("inactive", "atalogItems" and "wrongTransaction").

So it thinks the "C" in "inactiveCatalog" is a delimiter.

How could this be when I set the delimiter to "%2C"?


Solution

  • The tokenize() method uses each character of a String as delimiter. So, .tokenize("%2C") splits on %, 2 and C.

    Note that you do not get empty elements (that would be there in between % and and 2 and C) because tokenize() discards these empty strings when a delimiter appears twice in succession.

    You need split() that takes the entire string as delimiter:

    ArrayList rseFiltersToRemoveList = rseFiltersToRemove.split('%2C');
    // => [inactiveCatalog, wrongTransaction]
    

    See the online Groovy demo.