Search code examples
sql-serversql-server-2008ssisssis-2008

TSQL - Split delimited String into columns



The Problem

I have a number of filename strings that I want to parse into columns using a tilda as delimiter. The strings take on the static format:

  • Filepath example C:\My Documents\PDF
  • Surname example Walker
  • First Name example Thomas
  • Birth Date example 19991226
  • Document Created Datetime example 20180416150322
  • Document Extension example .pdf

So a full concatenated example would be something like:

C:\My Documents\PDF\Walker~Thomas~19991226~20180416150322.pdf

I want to ignore the file path and extension given in the string and only parse the following values into columns:

  • Surname, First Name, Birth Date, Document Created Datetime

So something like:

SELECT Surname = --delimitedString[0]
       FirstName = --delimitedString[1]
       --etc.

What I have tried

I know that I have several tasks I would need to perform in order to split the string, first I would need to trim off the extension and file path so that I can return a string delimited by tildas (~).

This is problem one for me, however problem 2 is splitting the new delimted string itself i.e.

Walker~Thomas~19991226~20180416150322

Ive had a good read through this very comprehensive question and It seems (as im using SQL Server 2008R2) the only options are to use either a function with loops or recursive CTE's or attempt a very messy attempt using SUBSTRING() with charIndex().

Im aware that If I had access to SQL Server 2016 I could use string_split but unfortunately I cant upgrade.

I do have access to SSIS but im very new to it so decided to attempt the bulk of the work within a SQL statement


Solution

  • I know you mentioned wanting to avoid the charindex() option if at all possible, but I worked it out in a hopefully semi-readable way. I find it somewhat easy to read complex functions like this when I space each parameter on a different line and use indent levels. It's not the most proper looking, but it helps with legibility:

    with string as (select 'C:\My Documents\PDF\Walker~Thomas~19991226~20180416150322.pdf' as filepath)
    
    select 
        substring(
            filepath,
            len(filepath)-charindex('\',reverse(filepath))+2,           --start location, after last '\'
            len(filepath)-                                              --length of path
                (len(filepath)-charindex('\',reverse(filepath))+2)-         --less characters up to last '\'
                (len(filepath)-charindex('.',filepath))                     --less file extention
            )
    from string