Search code examples
sqlregexsplitdatabricks

Databricks SQL regex_extract/split field before and after specific wordword


I am working on Databricks SQL and trying to parse strings into separate columns.

Here is an example that is working for me:

regexp_extract('100-200', '(\\d+)-(\\d+)', 1)
returns: 100

My specific example is splitting a column called match_name (i.e Detroit Lions at New York Jets) so I can get home and away team into their own separate columns (i.e AWAY = Detroit Lions, HOME = New York Jets)

Here is some code I have tried to implement against the match_name column in my SQL query to attempt to split at the word "at" - none of it currently works

split(match_name,'[at]', 1),
regexp_extract(match_name,'(\\d+)at(\\d+)', 1),

Any help is appreciated!


Solution

  • I've run into this frustration with Databricks as well, as I thought split_part was a valid function. At any rate, here's one way to use regexp_extract...

    with my_table as (
    select 'Team A at Team B' as game union all
    select 'Atlanta at Dallas'
    )
    select game, 
     regexp_extract(game, "(.*) at (.*)", 1) as away_team, 
     regexp_extract(game, "(.*) at (.*)", 2) as home_team
    from my_table
    

    Output:

    game away_team home_team
    Team A at Team B Team A Team B
    Atlanta at Dallas Atlanta Dallas