Search code examples
google-analyticslooker-studiocase-statement

Why will custom dimensions not work in case statements. Data studio, Google analytics


I am having an issue in Google data studio. I am creating a case statements that looks for keywords in a few different dimensions and categorises them accordingly. This all works fine until I add a custom dimensions. The see statement is valid as far as Google is concerned but it returns nothing.

CASE 
WHEN CONTAINS_TEXT(PAGE, "Heart") THEN "Cardiology"
WHEN PAGE = "Diabetes.website.com" THEN "Diabetes"
End

The above works fine

CASE 
    WHEN CONTAINS_TEXT(PAGE, "Heart") THEN "Cardiology"
    WHEN PAGE = "Diabetes.website.com" THEN "Diabetes"
WHEN WEB_SUBJECT = "Hypertension" THEN "Cardiology"
END

The above is valid but is blank when used. Web_subject is a custom dimensions defined in Google analytics.

Any help greatly appreciate. This one has me stumped.

UPDATE: So changing the format of the custom dimension expression to be REGEX worked in returning results for that particular condition but it now does not show results for any others. I have a feeling it may because of the data sitting behind it all.

A little more context, the data source in this case is a roll up google analytics account, So it contains analytics from 10+ domains. What I am attempting to do is group together page views and users from different domains into subjects using various different dimensions. For example if a USER selects the filter 'Diabetes' I want to return all pageviews for: Domain A where page title contains x Domain B where customDimesnion 1 = Y etc

This is my current mess of a CASE statement:

    case
when contains_text(Page, 'biosimilars-confidence') OR REGEXP_MATCH(WEBSITE subject,  '.*(?i)(Oncologie).*') OR CONTAINS_TEXT(Page, 'oncology') OR contains_text(Page, 'rrmm-challenges') OR  contains_text(Page, 'oncologyhighlights2020') OR CONTAINS_TEXT(Page, 'ASCO') OR CONTAINS_TEXT(Page Title, 'cancer') OR contains_text(Page Title, 'Oncology') OR  contains_text(Page Title, 'ASCO') THEN "Oncology"
when CONTAINS_TEXT(Page, 'DOMAINB') OR contains_text(Page, 'eprint') THEN "Eprint"
when CONTAINS_TEXT(Page, 'diabetes') OR REGEXP_MATCH(WEBSITE subject,  '.*(?i)(Diabetes).*') OR contains_text(Page, 'hypoglycemia') OR contains_text(Page, 'glp1ras') OR contains_text(Page Title, 'Diabetes') OR contains_text(Page Title, 'hyperglycemia') OR contains_text(Page Title, 'diabetes') OR contains_text(Page Title, 'ADA') OR contains_text(Page Title, 'GLP-1') then "Diabetes/Endocrinology"
when CONTAINS_TEXT(Page, 'cardiology') OR REGEXP_MATCH(WEBSITE subject,  '.*(?i)(Cardiologie).*') OR contains_text(Page Title, 'Cardiology') OR contains_text(Page Title, 'Heart failure') then'Cardiology'
when contains_text(Page, 'gastro') OR REGEXP_MATCH(WEBSITE subject,  '.*(?i)(Gastro-enterologie).*') OR contains_text(Page Title, 'Gastroenterology') then "Gastroenterology"
when contains_text(Page, 'rheumatology') OR REGEXP_MATCH(WEBSITE subject,  '.*(?i)(Reumatologie).*') OR contains_text(Page Title, 'Rheumatology') OR contains_text(Page Title, 'adalimumab') OR CONTAINS_TEXT(Page Title, 'arthritis') OR CONTAINS_TEXT(Page Title, 'EULAR') OR contains_text(Page Title, "joint")  then "Rheumatology"
when CONTAINS_TEXT(Page, 'migraine') OR REGEXP_MATCH(WEBSITE subject,  '.*(?i)(Neurologie).*') OR contains_text(Page Title, 'Neurology') OR contains_text(Page Title, "Ataxia") OR contains_text(Page Title, "EAN 2020") then "Neurology"
when Page = 'DOMAINA.com' OR contains_text(Page, 'training') OR Page = 'ime.DOMAINA.com' then "Corporate site traffic"
when contains_text(Page Title, 'Gynaecology ') then "Gynaecology "
when CONTAINS_TEXT(Page Title, 'Dermatology') OR REGEXP_MATCH(WEBSITE subject,  '.*(?i)(Dermatologie).*') OR contains_text(Page Title, 'eczema') OR contains_text(Page Title, 'Laser hair removal') then "Dermatology"
when CONTAINS_TEXT(Page Title, 'COVID-19') OR REGEXP_MATCH(WEBSITE subject,  '.*(?i)(Coronavirus (COVID-19)).*') then "COVID-19"
WHEN CONTAINS_TEXT(Page Title, 'General practice') then "General practice"
when contains_text(Page Title, 'Haematology') OR REGEXP_MATCH(WEBSITE subject,  '.*(?i)(Hematologie).*') OR contains_text(Page Title, 'EHA25') then "Haematology"
when contains_text(Page Title, 'Mental health') OR CONTAINS_TEXT(Page Title, 'Psychology') then 'Mental health'
when contains_text(Page Title, 'penile length') OR REGEXP_MATCH(WEBSITE subject,  '.*(?i)(Urologie).*') then "Urology"
when contains_text(Page Title, 'ERS 2019') OR REGEXP_MATCH(WEBSITE subject,  '.*(?i)(Longziekten).*') THEN "Lung diseases"
when contains_text(Page, 'noonan')Then "Genetics"
when REGEXP_MATCH(WEBSITE subject,  '.*(?i)(Farmacie).*') then 'Pharmacy'
when REGEXP_MATCH(WEBSITE subject,  '.*(?i)(Heelkunde).*') then 'Surgery'
when REGEXP_MATCH(WEBSITE subject,  '.*(?i)(Huisartsgeneeskunde).*') then 'Family medicine'
when REGEXP_MATCH(WEBSITE subject,  '.*(?i)(Infectieziekten).*') then 'Infectious Diseases'
when REGEXP_MATCH(WEBSITE subject,  '.*(?i)(Overig).*') then 'Other/Uncategorized'
when REGEXP_MATCH(WEBSITE subject,  '.*(?i)(Voeding).*') then 'Nutrition'
else 'Other/Uncategorized'
end

I have anonymised some bits so if it looks strange that is why. It may well be I am going about this in completely the wrong way. One other thing to note, the domain names are stored in the PAGE dimension.


Solution

  • Update (Expanded CASE)

    0) Original Expanded CASE

    It may be due to:

    contains_text(Page Title, 'ADA')
    

    Adding a Word Boundary around ADA may do the trick, ensuring that words that simply contain ADA are not captured into the respective WHEN clause

    REGEXP_MATCH(Page Title, ".*(?i)(\\bADA\\b).*")
    

    An alternative is to follow the two step process below:

    1) Field_CONCAT

    Create the following Data Source-level Calculated Field to CONCAT the fields into one single consolidated field:

    CONCAT(Page, ", ", WEBSITE subject, ", ", Page Title)
    

    2) New_CASE

    Create the CASE statement below:

    • Field: Where Field_CONCAT represents the field created above;
    • | OR |: The | Pipe operators | serve as the RegEx version of OR
    • Escape Sequence: \\ is used to escape special RegEx characters such as (, |, ., etc;
    • Word Boundaries: Added a word boundary on either side of ADA (as well as ASCO) to ensure that text that simply contain ada are not part of the Diabetes/Endocrinology WHEN clause:
    CASE
      WHEN REGEXP_MATCH(Field_CONCAT, ".*(?i)(biosimilars-confidence|Oncologie|oncology|rrmm-challenges|oncologyhighlights2020|\\bASCO\\b|cancer).*") THEN "Oncology"
      WHEN REGEXP_MATCH(Field_CONCAT, ".*(?i)(DOMAINB|eprint).*") THEN "Eprint"
      WHEN REGEXP_MATCH(Field_CONCAT, ".*(?i)(diabetes|hypoglycemia|glp1ras|hyperglycemia|\\bADA\\b|GLP-1).*") THEN "Diabetes/Endocrinology"
      WHEN REGEXP_MATCH(Field_CONCAT, ".*(?i)(cardiology|Cardiologie|Heart failure).*") THEN 'Cardiology'
      WHEN REGEXP_MATCH(Field_CONCAT, ".*(?i)(gastro|Gastro-enterologie|Gastroenterology).*") THEN "Gastroenterology"
      WHEN REGEXP_MATCH(Field_CONCAT, ".*(?i)(rheumatology|Reumatologie|adalimumab|arthritis|EULAR|joint).*") THEN "Rheumatology"
      WHEN REGEXP_MATCH(Field_CONCAT, ".*(?i)(migraine|Neurologie|Neurology|Ataxia|EAN 2020).*") THEN "Neurology"
      WHEN REGEXP_MATCH(Field_CONCAT, ".*(?i)(DOMAINA\\.com|training|ime\\.DOMAINA\\.com).*") THEN "Corporate site traffic"
      WHEN REGEXP_MATCH(Field_CONCAT, ".*(?i)(Gynaecology).*") THEN "Gynaecology "
      WHEN REGEXP_MATCH(Field_CONCAT, ".*(?i)(Dermatology|Dermatologie|eczema|Laser hair removal).*") THEN "Dermatology"
      WHEN REGEXP_MATCH(Field_CONCAT, ".*(?i)(COVID-19|Coronavirus \\(COVID-19\\)).*") THEN "COVID-19"
      WHEN REGEXP_MATCH(Field_CONCAT, ".*(?i)(General practice).*") THEN "General practice"
      WHEN REGEXP_MATCH(Field_CONCAT, ".*(?i)(Haematology|Hematologie|EHA25).*") THEN "Haematology"
      WHEN REGEXP_MATCH(Field_CONCAT, ".*(?i)(Mental health|Psychology).*") THEN 'Mental health'
      WHEN REGEXP_MATCH(Field_CONCAT, ".*(?i)(penile length|Urologie).*") THEN "Urology"
      WHEN REGEXP_MATCH(Field_CONCAT, ".*(?i)(ERS 2019|Longziekten).*") THEN "Lung diseases"
      WHEN REGEXP_MATCH(Field_CONCAT, ".*(?i)(noonan).*") THEN "Genetics"
      WHEN REGEXP_MATCH(Field_CONCAT, ".*(?i)(Farmacie).*") THEN 'Pharmacy'
      WHEN REGEXP_MATCH(Field_CONCAT, ".*(?i)(Heelkunde).*") THEN 'Surgery'
      WHEN REGEXP_MATCH(Field_CONCAT, ".*(?i)(Huisartsgeneeskunde).*") THEN 'Family medicine'
      WHEN REGEXP_MATCH(Field_CONCAT, ".*(?i)(Infectieziekten).*") THEN 'Infectious Diseases'
      WHEN REGEXP_MATCH(Field_CONCAT, ".*(?i)(Overig).*") THEN 'Other/Uncategorized'
      WHEN REGEXP_MATCH(Field_CONCAT, ".*(?i)(Voeding).*") THEN 'Nutrition'
      ELSE 'Other/Uncategorized'
    END
    

    Added a New Page to the Google Data Studio Report and a GIF to demonstrate:

    Original POST

    0) ELSE NULL

    Note that the CASE statements in the question and this suggestion have not explicitly stated the ELSE clause, thus by default, ELSE values are treated as NULL.

    1) Tweaked CASE

    Tried out the CASE statement and it works as expected! One tweak to the initial CASE statement is the inclusion of a Logical Operator, OR:

    CASE
      WHEN CONTAINS_TEXT(PAGE, "Heart") OR WEB_SUBJECT = "Hypertension" THEN "Cardiology"
      WHEN PAGE = "Diabetes.website.com" THEN "Diabetes"
    END
    

    2) Alternative CASE

    In addition, you could also have a look to see whether the CASE statement below resolves the issue; it uses the REGEXP_MATCH function and ensures that fields Contain .* the respective values as well as adding a Case Insensitive Flag (?i) thus matching Hypertension, hypertension, HYPerTENsion, etc:

    CASE
      WHEN REGEXP_MATCH(PAGE, ".*(?i)(Heart).*") OR REGEXP_MATCH(WEB_SUBJECT, ".*(?i)(Hypertension).*") THEN "Cardiology"
      WHEN REGEXP_MATCH(PAGE, ".*(?i)(Diabetes\\.website\\.com).*") THEN "Diabetes"
    END
    

    3) Next Steps and a GIF

    If the above doesn't resolve the issue, could you elaborate with a screenshot(s) and provide some sample data (removing or replacing any sensitive information); feel free to add data to This Editable Google Sheet (also used in the Report and GIF below).

    Google Data Studio Report and a GIF to elaborate on the above: