When we injest data into bigquery, we injest everything as a string, regardless of source API data type.
So if I want to cast a column back to its correct type then I would write something like cast(cast(col as float64) as int64)
. I would much rather just call a function test_cast(col,'int64')
. This would save so much time and be easier to read.
I would like to create a udf in bigquery that can cast my data as the correct data type given datatype as a parameter. This is what I have tried.
create or replace function test_cast(col String, data_type string) as ( case when lower(data_type) = 'int64' then cast(cast(col as float64) as int64) when lower(data_type) = 'string' then cast(col as string) else null end )
The function I actually wrote, goes through all the data types. so the response code in bigquery looks like this:
No matching signature for operator CASE; all THEN/ELSE arguments must be coercible to a common type but found: BOOL, DATE, FLOAT64, INT64, STRING, TIMESTAMP; actual argument types (WHEN THEN) ELSE: (BOOL DATE) (BOOL STRING) (BOOL INT64) (BOOL FLOAT64) (BOOL TIMESTAMP) (BOOL BOOL) NULL .
Each potential outcome of the case-when statement needs to be of the same type.
You could simplify your function to only use it on the "double" casting.
CREATE TEMP FUNCTION floatString_to_integer(col String)
AS
(
CAST(CAST(col as float64) as int64)
);
SELECT floatString_to_integer("23.23")