Search code examples
regexlooker-studiore2

How to extract a number marked with specific word from text string in Google Data Studio


I have a text string that contains several instances of numbers, example of the data is below.

I am trying to extract the number of keys (the number that is followed by the text -keys). I tried several REGEXP_EXTRACT patterns without luck.

{38-keys,fit-out,closed,eddy-tamy,datechange-feb2025,staff-onboard,sandy-brighton,open-dec2020}

{same-year,budgeted,signature-done,mark-picard,hotel,5-keys}

{active,building,itsa-signed,2322-keys,pending-signature,next-year-(construction),opening-feb2024}


Solution

  • 0) Summary

    • Use #1 OR #2 OR #3 below (added two additional suggestions as the author received an error with #1 and then #2 in regards to the CAST function which may be a result of spacing, Data Source (PostgreSQL) specific or another issue).

    1) With CAST

    It can be achieved by using the Calculated Field below (where Field represents the respective field name) which extracts all digits immediately preceding -keys; the CAST function was added to ensure that the field Type is Number (the default field Type of REGEXP_EXTRACT values are Text); also, for future reference, the Raw Input Literal R was used so that \d represents the sequence for a digit vs the default (without R) in Google Data Studio, which requires two back slashes (\\d):

    CAST(REGEXP_EXTRACT(Field, R"(\d+)-keys") AS NUMBER )
    

    Editable Google Data Studio Report (Embedded Google Sheets Data Source) and a GIF to elaborate:

    2) Without CAST

    Create the Calculated Field field below at the Data Source and change the field Type from Text to Number; in regards to the difference between the Calculated Field below and and the one above, the one below removes the CAST function:

    REGEXP_EXTRACT(Field, R"(\d+)-keys")
    

    Added a New Page to the Editable Google Data Studio Report (Embedded Google Sheets Data Source) and a GIF to demonstrate:

    3) Two Calculated Fields

    First create the the RegEx_Field (3.1 - Support Field) and then the CAST_Field (3.2 - Primary Field, where RegEx_Field represents the name of the respective field created in 3.1):

    3.1) RegEx_Field (Support field, thus hide field at the Data Source if required)

    REGEXP_EXTRACT(Field, R"(\d+)-keys")
    

    3.2) CAST_Field (The field that will be used in the Report)

    CAST(RegEx_Field as NUMBER)
    

    Added another Page to the Editable Google Data Studio Report (Embedded Google Sheets Data Source) and a GIF to visualise the process: