Search code examples
regexlooker-studiore2

How can I extract an ID from an URL in Datastudio?


My task is to create a new Dimension that contains all IDs.

I`ve got 50 IDs that I need to Extract from URLs. The URLs are in the dimension "seite"(page). I already know all the IDs (example IDs: S121314, s232425). The URL’s look like this: www.website.ch/de/cms/seite/a-lot-of-text-stands-here-S121314a340921 Not all URLs have an ID and the second part of the ID here "a340921" should be left out.

I tried this and a few other things, but nothing works:

New Dimension "ID":

CASE
WHEN Seite IN ("S143254","S162892","S110454","S... otherIDs") THEN "hat ID"
ELSE "keine ID"
END

As a result, I get that all 13000 URLs have "keine ID". I know that this WHEN IN code won’t Extract me the IDs I just wanted to see if I could find the IDs in the URLs which failed.

If anyone could help me that would be great. If anything with the question is wrong or information is missing please tell me.


Solution

  • Create a new field with this formula (I'll call this new field seite_id):

    REGEXP_EXTRACT(seite, '-(S[0-9]+)')
    

    For this regex, I am supposing all your IDs:

    • appear in the URL after a "-" (dash);
    • starts with the letter "S";
    • contains only numbers after the "S";
    • stops after the first non-number character (the small "a", for example).

    If my guesses are correct, this regex should do the job and you'll have a new field with only the page ID. After this, your statement should work:

    CASE
      WHEN seite_id IN ("S143254","S162892","S110454","S... otherIDs") THEN "hat ID"
      ELSE "keine ID"
    END