Search code examples
typesgoogle-bigquerycastinguser-defined-functionstype-coercion

Bigquery udf case statement coerce different data types


When we injest data into bigquery, we injest everything as a string, regardless of source API data type.

So if I want to cast a column back to its correct type then I would write something like cast(cast(col as float64) as int64). I would much rather just call a function test_cast(col,'int64'). This would save so much time and be easier to read.

I would like to create a udf in bigquery that can cast my data as the correct data type given datatype as a parameter. This is what I have tried.

create or replace function test_cast(col String, data_type string) as ( case when lower(data_type) = 'int64' then cast(cast(col as float64) as int64) when lower(data_type) = 'string' then cast(col as string) else null end )

The function I actually wrote, goes through all the data types. so the response code in bigquery looks like this:

No matching signature for operator CASE; all THEN/ELSE arguments must be coercible to a common type but found: BOOL, DATE, FLOAT64, INT64, STRING, TIMESTAMP; actual argument types (WHEN THEN) ELSE: (BOOL DATE) (BOOL STRING) (BOOL INT64) (BOOL FLOAT64) (BOOL TIMESTAMP) (BOOL BOOL) NULL .


Solution

  • Each potential outcome of the case-when statement needs to be of the same type.

    You could simplify your function to only use it on the "double" casting.

    CREATE TEMP FUNCTION floatString_to_integer(col String) 
    AS
    ( 
      CAST(CAST(col as float64) as int64) 
    );
    
    SELECT floatString_to_integer("23.23")