Search code examples
regexlooker-studio

regex extract in google data studio


I am trying to extract values (in this case max and min price ) from page google analytics dimension but having a problem since the position for the max/min price can vary depending on the search that the user took.

I tried : REGEXP_EXTRACT(Page,'.*minPrice.=(.*)')

It works fine when the value is at end of the string but catches the rest of the string when the value is at the beginning or in the middle.

output

Here is an example of page values:

/search/detail/119441465?s[propertyTypes][0]=house&spropertyTypes=condo&s[minPrice]=999999&s[locations][0][city]=Ocean City&s[locations][0][state]=NJ

/search?s[propertyTypes][0]=house&spropertyTypes=condo&s[minPrice]=999000&s[orderBy][0]=r&s[locations][0][city]=Barnegat Light&s[locations][0][state]=NJ

/search/detail/118588431?s[propertyTypes][0]=house&spropertyTypes=condo&s[minPrice]=999000&s[locations][0][city]=Barnegat Light&s[locations][0][state]=NJ

/search/detail/118588431?s[city]=Barnegat Light&s[state]=NJ&s[propertyTypes][0]=house&spropertyTypes=condo&s[minPrice]=999000

/search/detail/119438690?s[orderBy]=sourceCreationDate,desc&s[locations][0][city]=Venice&s[locations][0][state]=FL&s[maxPrice]=525000&s[minPrice]=99900

/search?s[orderBy]=sourceCreationDate,desc&s[locations][0][city]=Venice&s[locations][0][state]=FL&s[maxPrice]=525000&s[minPrice]=99900

/search/detail/119337213?s[orderBy]=sourceCreationDate,desc&s[locations][0][state]=NJ&s[locations][0][county]=Monmouth County&s[maxPrice]=1488000&s[minPrice]=992000

/search/detail/119415983?s[orderBy]=sourceCreationDate,desc&s[locations][0][state]=NJ&s[locations][0][county]=Monmouth County&s[maxPrice]=1488000&s[minPrice]=992000

/search/detail/119046654?s[orderBy]=sourceCreationDate,desc&s[locations][0][state]=NJ&s[locations][0][county]=Gloucester County&s[maxPrice]=148800&s[minPrice]=99200

/search?s[orderBy]=sourceCreationDate,desc&s[locations][0][state]=NJ&s[locations][0][county]=Gloucester County&s[maxPrice]=148800&s[minPrice]=99200

/search/detail/119330920?s[orderBy]=sourceCreationDate,desc&s[locations][0][state]=NJ&s[locations][0][county]=Gloucester County&s[maxPrice]=148800&s[minPrice]=99200


Solution

  • You can use

    REGEXP_EXTRACT(Page,'[[]minPrice[]]=([0-9]+)')
    REGEXP_EXTRACT(Page,r'\[minPrice]=(\d+)')
    

    See the regex demo. Details:

    • [[]minPrice[]]= / \[minPrice]= - [minPrice]= string
    • ([0-9]+) / \d+ - Group 1: one or more digits.

    The .* is not necessary as the REGEXP_EXTRACT does not anchor the matches at the start of string.