Search code examples
sqlamazon-redshift

How to split the full name column into first, middle and last name columns in SQL redshift


We got a requirement in the redshift table to split the full name column into first_name, middle_name, last_name columns with the help of space in the string using the SQL script. Below is the sample data and expected output for the columns. The first_name and last_name are deriving fine but getting issues for the middle_namewe with the below SQL, it removes the strings in middle name which is the common in other two columns hence it is not working for a few scenarios below are examples

Can you please help us fix this issue?

enter image description here

SQL Query: "Select fullname , SUBSTRING(fullname , 1, CHARINDEX(' ', fullname) - 1) as FirstName, RTRIM(LTRIM(REPLACE(REPLACE(fullname,SUBSTRING(fullname , 1, CHARINDEX(' ', fullname) - 1),''), REVERSE( LEFT( REVERSE(fullname), CHARINDEX(' ', REVERSE(fullname))-1 ) ),'')))as MiddleName, REVERSE( LEFT( REVERSE(fullname), CHARINDEX(' ', REVERSE(fullname))-1 ) ) as LastName From (select 'john johnson' fullname)"

enter image description here

enter image description here

enter image description here

enter image description here


Solution

  • I'd use REGEXP_SUBSTR for this. In my opinion this gets this whole thing more readable:

    select
      fullname,
      regexp_substr(full_name, '^[^ ]+') as first_name,
      trim(' ' from regexp_substr(full_name, ' .* ')) as middle_name,
      regexp_substr(full_name, '[^ ]+$') as last_name
    from mytable;
    

    Explanation of the regular expressions:

    • '^[^ ]+' = non-blank characters directly after string start
    • ' .* ' = first blank and last blank and all characters inbetween
    • '^[^ ]+' = non-blank characters directly before string end

    https://docs.aws.amazon.com/de_de/redshift/latest/dg/REGEXP_SUBSTR.html https://docs.aws.amazon.com/de_de/redshift/latest/dg/r_TRIM.html