Search code examples
javaregexcql

I need a regex that finds substrings in a CQL string split by occurrences of AND and OR outside quotes and taking escaped quotes into account


In this example source string:

index1 = "searchterm1" AND (index2 any "\"value2.1\" \"value2.2 AND sometext\" \"value2.3 OR sometext\"") OR index3 = "searchterm3"

The source needs to be splitted by the following bold text:

index1 = "searchterm1" AND (index2 any "\"value2.1\" \"value2.2 AND sometext\" \"value2.3 OR sometext\"") OR index3 = "searchterm3"

I expect this to be the result:

match 1 with group 1: index1 = "searchterm1"

match 2 with group 1: AND and group 2:(index2 any "\"value2.1\" \"value2.2 AND sometext\" \"value2.3

match 3 with group 1: OR and group 2: sometext\"") OR index3 = "searchterm3"

I tried this:

\b(AND|OR)(?=([^\"]*\"[^\"]*\")*[^\"]*$)

but those escaped quotes are giving me a hard time.

EDIT:

An other example:

index1 = "searchterm1" AND (index2 any "\"value2.1\" \"value2.2 AND sometext\" \"value2.3 OR sometext\"")) OR (index3 = "searchterm3" AND (index4 any "\"value4.1\" \"value4.2 OR sometext\" \"value4.3 AND sometext\"") AND index5 = "searchterm5"

where it should be splitted by the following bold text:

index1 = "searchterm1" AND (index2 any "\"value2.1\" \"value2.2 AND sometext\" \"value2.3 OR sometext\"")) OR (index3 = "searchterm3" AND (index4 any "\"value4.1\" \"value4.2 OR sometext\" \"value4.3 AND sometext\"") AND index5 = "searchterm5"


Solution

  • You can use the following regex:

    (AND|OR|^).*?(?:\1.*?)*(?=(AND|OR|$))
    

    It will match:

    • (AND|OR|^): AND, OR or the start of string symbol
    • .*?: the least amount of characters that are followed by
    • (?:\1.*?)*: the same AND, OR sequence of characters and the any other characters - optionally
    • (?=(AND|OR|$)): AND, OR or the end of string symbol

    Check the demo here.