Search code examples
google-bigquery

BigQuery - Replace whitespace in string


My hair is going grey over here. I have a table with a column where i want to remove the whitespace. Expected result: "8 800 000 kr" -> "8800000kr"

I can't get this to work. Please see query output below:

Image removed by the author because it contained explicit language.

Why is the white space not removed?

When i perform the query against mock data it works: enter image description here

What could be wrong with the string in the first example? Everywhere i look i looks identical to the mocked string in example 2.


Solution

  • Below example is for BigQuery Standard SQL and explains your problem

    #standardSQL
    with `project.dataset.table` as (
      select 'with spaces' space_type, '8 800 000 kr' slutpris union all
      select 'with non-breaking spaces', replace('8 800 000 kr', chr(32), chr(160)) slutpris
    )
    select space_type, slutpris,
      replace(slutpris, ' ', ''),
      regexp_replace(slutpris, r'\s', ''),
      regexp_replace(slutpris, r'\s|kr', '') 
    from `project.dataset.table`    
    

    with output

    enter image description here

    So, as you can see - non-breaking space is not recognized as a space character or any white space

    Forgot to mention possible solution -

    #standardSQL
    with `project.dataset.table` as (
      select 'with spaces' space_type, '8 800 000 kr' slutpris union all
      select 'with non-breaking spaces', replace('8 800 000 kr', chr(32), chr(160)) slutpris
    )
    select space_type, slutpris,
      translate(slutpris, chr(32) || chr(160), ''),
      regexp_replace(slutpris, '[\u00A0\\s]', ''),
      regexp_replace(slutpris, '[\u00A0\\s]|kr', '')
    from `project.dataset.table` 
    

    with output

    enter image description here