I am creating reports in Google's Data Studio, and I have successfully created several custom dimensions in the past using REGEXP_MATCH
on the Keyword
dimension combined with CASE
statements to create the dimensions I need. This one has me stumped.
I have data coming in through the Keyword
dimension that contains a substring that I would like to extract and display as a custom dimension.
A subset of the keyword data coming through looks like this:
09172018_rp_ws_1_og_
img s4_ac_p_act_
img s5_ws_5_m_
img s4_ws_5_m_
I am trying to use REGEXP_EXTRACT
to create a new calculated field called Image type
that is a dimension that groups all entries with starting with img, followed by a space, and then any alphanumeric afterwards ending with an underscore. So all entries with img s4
would be grouped together, img s5
would be grouped together. Anything in the keyword dimension without that pattern can be left out of the dataset entirely.
I am not able to get any results except null
using REGEXP_EXTRACT
.
Even just trying REGEXP_EXTRACT(Keyword, '.*img.*')
yields null when entering in the formula for the new calculated field.
What is stumping me is I tried the following just to see if my syntax was off, and this formula does return results (just not what I want as the image types are not aggregated).
CASE
WHEN (REGEXP_MATCH(Keyword, '.*img.*')) THEN Keyword
ELSE "Not Set"
END
Any idea where I am going wrong? I can't get any output out of REGEXP_EXTRACT(Keyword, 'your reg expression here')
no matter what I enter.
Mind that in order to extract any text from REGEXP_EXTRACT
, you should define a capturing group inside the regex pattern. In short, enclose the part you need to extract with a pair of unescaped parentheses.
Now, to match img
at the start of the string you need to use ^
anchor, it matches the start of a string position.
To match 1 or more chars, use +
.
So, you may use any of the following depending on your actual rules:
REGEXP_EXTRACT(Keyword, '^img ([a-zA-Z0-9_]+)')
REGEXP_EXTRACT(Keyword, '^img\\s+(\\w+)')
REGEXP_EXTRACT(Keyword, '^img\\s+(.+)')
Details
^
- start of stringimg
- a literal substring([a-zA-Z0-9_]+)
- Capturing group 1: one or more letters, digits or _
\s+
- 1 or more whitespaces\w+
- 1 or more word chars: letters, digits or _
.+
- 1 or more chars other than line break chars.