Search code examples
sqlgoogle-bigquerybigquery-udfspring-cloud-gcp-bigquery

How to extract all (including int and float) numerical values in a string column in Google BigQuery?


I have a table Table_1 on Google BigQuery which includes a string column str_column. I would like to write a SQL query (compatible with Google BigQuery) to extract all numerical values in str_column and append them as new numerical columns to Table_1. For example, if str_column includes first measurement is 22 and the other is 2.5; I need to extract 22 and 2.5 and save them under new columns numerical_val_1 and numerical_val_2. The number of new numerical columns should ideally be equal to the maximum number of numerical values in str_column, but if that'd be too complex, extracting the first 2 numerical values in str_column (and therefore 2 new columns) would be fine too. Any ideas?


Solution

  • Consider below approach

    select * from (
      select str_column, offset + 1 as offset, num
      from your_table, unnest(regexp_extract_all(str_column, r'\b([\d.]+)\b')) num with offset
    )
    pivot (min(num) as numerical_val for offset in (1,2,3))    
    

    if applied to sample data like in your question - output is

    enter image description here