Search code examples
google-bigquery

Splitting string with three parts into two when only separated by spaces - Bigquery


I am creating a database of names in BigQuery. The source data provides the name in this format: last first middle (aka, no commas, only spaces).

What I want to end up with is last, first middle (essentially inserting a comma after the last name).

At first, I thought I'd need to do a REGEXP_EXTRACT after the first space.

Essentially:

CONCAT(SPLIT(REGEXP_EXTRACT(Teacher_Name, r'^[^ ]*'),' ')) AS Teacher_Name

...but that's not working as I get the error "Query error: Grouping by expressions of type ARRAY is not allowed at [21:25]".


Solution

  • There is no SPLIT function needed.

    SELECT REGEXP_EXTRACT(Teacher_Name, r'[^ ]*$') ||', ' || REGEXP_EXTRACT(Teacher_Name, r'^[^ ]*') as Teacher_Name
    FROM unnest(['Ana J Hope','Thomas Hope','this is no name']) as Teacher_Name
    

    Please consider using a JavaScript UDF, because it has a more powerful regex syntax and enables to fix upper and lower cases. Also a sanity check could be included.