Search code examples
sqlcastinggoogle-bigquery

CAST a column contain empty cells, NULL and String to FLOAT64 in BigQuery


I am encountering a problem with casting strings to float that I haven't been able to solve. I have a column in a BQ table contain values (numbers as string), NULLs adn empty cells. Now, I want to CAST the values to FLOAT64 but I get a double value error. I tried:

CAST(myfield AS FLOAT64) as myfield

and thinking it might help to replace empty field by NULL:

CAST(REGEXP_REPLACE(myfield, " ", NULL) AS FLOAT64 ) as myfield

But this sets everything to NULL.

Any idea on how to solve this would be greatly appreciated.


Solution

  • Based on a comment you made to another question, it is clear that spaces are not your only problem in your data.

    You need to identify and fix the problem data (which should ideally be done before ingestion in to the database, but that's a different question).

    For example, you could use safe_cast():

    SAFE_CAST() doesn't throw an error, instead it returns NULL, so you can just check for those NULLs...

    SELECT
      *
    FROM
      yourTable
    WHERE
          myfield IS NOT NULL
      AND SAFE_CAST(REPLACE(myfield, ' ', '') AS FLOAT64) IS NULL
    

    Or, just use the SAFE_CAST() in your query (instead of CAST()) and accept that badly formed data will yield a NULL

    • That prevents runtime errors
    • But 'hides' bad data as NULLs

    EDIT:

    Also, you can make your REPLACE() more robust by stripping other white space characters too (such as tabs, carriage returns, etc).