Search code examples
regexlooker-studiore2

Using REGEXP to extract specific text between slashes from URL


We started using Google Data Studio to visualize our data, and we need RegEx to help us extract a specific piece of text from the URL list from our system.

URL example:

/town/articletype/46646-this-is-an-example-article

What we need from the URL by using RegEx:

  • /town/ (without slashes, and eventually capitalized the first letter, if possible)
  • /articletype/ (also without slashes)
  • /46646- (without / and - && this is the article ID we also need)
  • -this-is-an-example-article (without "-" and capitalized first letter)

We tried numerous RegEx, and we managed to extract /town/ from the URL by using the following Calculated Field:

REGEXP_EXTRACT(Page , '/(.*?)(/)')

Solution

  • The 4 Calculated Fields below do the trick:

    1) Town

    CONCAT(UPPER(REGEXP_EXTRACT(Page , "^/(\\w{1})")), LOWER(REGEXP_EXTRACT(Page , "^/\\w{1}([^/]*)")))
    

    2) articletype

    REGEXP_EXTRACT(Page , "^/\\w+/([^/]*)")
    

    3) 46646

    REGEXP_EXTRACT(Page , "^/\\w+/\\w+/([^-]*)")
    

    4) This is an example article

    CONCAT(UPPER(REGEXP_EXTRACT(Page , "/\\w+/\\w+/\\d+-(\\w{1}).*$")), LOWER(REGEXP_REPLACE(REGEXP_EXTRACT(Page , "/\\w+/\\w+/\\d+-\\w{1}(.*)$"), "-", " ")))
    

    Google Data Studio Report and a GIF to elaborate: