Search code examples
looker

Using REGEXP_MATCH, cannot categorize data with text string


In Looker Studio, I'm using REGEXP_MATCH() to categorize a column of data from an NPS survey into Promoters, Passives, and Detractors. The options are as follows: "0 (not at all likely)", "1", "2", "3", "4", "5", "6", "7", "8", "9", or "10 (extremely likely)". I have no problems correctly categorizing the purely numerical values into my desired categories, however my 0 and 10 values with affiliated text are not getting picked up by the REGEXP_MATCH() expression. I have tried many variations (below) and none of them are working correctly.

REGEXP_MATCH(How likely would you be to recommend this offering to others?, r'extremely likely') ...r'^[0-9]+\s*(extremely likely)$' ...r'^10\s*(extremely likely)$|^9$' ...r'^10\s*(extremely likely)$|9' ...r'^9|10 (extremely likely)$' ...r'^(9|10 (extremely likely))$' ...“(9|.10.)” ..."(9|.extremely likely.)" I have tried all of these, among others. Every one of these options will correctly categorize the "9" as a promoter, but none of the "10 (extremely likely)" responses. What am I missing?


Solution

  • You need to escape the parentheses around the extremely likely string, otherwise it gets understood as a capturing group. See the doc of the re2 library.
    As a consequence something like r'9|(10 \(extremely likely\))' should be working fine.
    I recommend experimenting on RegExps using https://regex101.com/ which is pretty easy to use and explains what your current RegExp is looking for.