Search code examples
stringgoogle-bigqueryregexp-replace

Amend part of string using BigQuery REGEXP_REPLACE


I wonder whether someone may be able to help me please.

My eventinfo.eventLabel field in my bigquery table contains the following data:

View View Code red for the BDAT group

SELECT
  #select all fields excluding those under the hits record
  * EXCEPT (hits),
  #start array - this rebuilds the hit record
  ARRAY(
  SELECT
    #unnest the hit field, select each field excluding those under the page record
    AS STRUCT * EXCEPT (eventInfo ),
    (
    SELECT
      #select all page fields excluding pageTitle
      AS STRUCT eventInfo.* EXCEPT (eventLabel),
      #remove the query parameter from the pagePath fields
      REGEXP_REPLACE(eventinfo.eventLabel, r'View\sView .*', 'View View redacted') AS eventLabel) AS eventinfo
  FROM
    UNNEST(hits) ) AS hits
FROM
  `bigquery.Tested.ga_sessions_20180801` 

I'm trying to remove the "Code red" element of the data and replace it with the term "redacted".

Please note that "Code red" is not the only value and the number of characters can increase and decrease. But the "View View" and "for the" are constant throughout my data.

I know that the problem is the REGEXP REPLACE line, and I can manage to put the "redacted" text into the field, but I've not been able to remove the "Code red" text.

Could someone possibly look at this and offer some guidance on how I may be able to change this.

Many thanks and kind regards

Chris


Solution

  • select 
      [
      REGEXP_REPLACE('View View Code red for the BDAT group', r'View View .*? for the', 'View View redacted for the'),
      REGEXP_REPLACE('View View Code blue for the BDAT group', r'View View .*? for the', 'View View redacted for the'),
      REGEXP_REPLACE('View View red code for the BDAT group', r'View View .*? for the', 'View View redacted for the')
      ]
    

    enter image description here