Search code examples
sql-serversubstringcross-applycharindex

Split string in single column (& multiple rows) into multiple columns in SQL Server (missing data exists)


Following is how my data looks like:

In one column & several rows:

[column]
variable1:A,variable2:B,variable3:C,variable4:D,...,variable9:I
variable1:A,variable2:B,variable4:D,.....variable7:G
variable8:H,variable9:I
variable9:I
.
.
.
.

and I am trying to split as such:

variable1 | variabel2 | variable3| variable4| ...variable7| variable8|varaible9
A           B              C            D           G         H           I
A           B                           D     
                                                              H           I
                                                                          I

I am working on this code:

--Approach 1. cross apply

select Finaldata.*

from [dbo].[table_name]
cross apply ( select str = [dbo].[table_name].attributes + ',,' ) f1
cross apply ( select p1 = charindex( ',', str ) ) ap1
cross apply ( select p2 = charindex( ',', str, p1 + 1 ) ) ap2
cross apply ( select p3 = charindex( ',', str, p2 + 1 ) ) ap3
cross apply ( select p4 = charindex( ',', str, p3 + 1 ) ) ap4
cross apply ( select p5 = charindex( ',', str, p4 + 1 ) ) ap5
cross apply ( select p6 = charindex( ',', str, p5 + 1 ) ) ap6
cross apply ( select p7 = charindex( ',', str, p6 + 1 ) ) ap7
cross apply ( select p8 = charindex( ',', str, p7 + 1 ) ) ap8
cross apply ( select p9 = charindex( ',', str, p8 + 1 ) ) ap9

cross apply ( select variable1 = substring( str, 1, p1-1 )                   
             ,variable2 = substring( str, p1+1, p2-p1-1 )
             ,variable3 = substring( str, p2+1, p3-p2-1 )
             ,variable4 = substring( str, p3+1, p4-p3-1 )
             ,variable5 = substring( str, p4+1, p5-p4-1 )
             ,variable6 = substring( str, p5+1, p6-p5-1 )
             ,variable7 = substring( str, p6+1, p7-p6-1 )
             ,variable8 = substring( str, p7+1, p8-p7-1 )
             ,variable9 = substring( str, p8+1, p9-p8-1 )
      )Finaldata


--Approach 2. substring & charindex
--strings are in column 'attributes'

substring(attributes, charindex(attributes, 'variable1'),charindex(attributes, ',')-1) variable1
substring(attributes, charindex(attributes, 'variable2'),charindex(attributes, ',')-1) variable2
substring(attributes, charindex(attributes, 'variable3'),charindex(attributes, ',')-1) variable3
substring(attributes, charindex(attributes, 'variable4'),charindex(attributes, ',')-1) variable4
substring(attributes, charindex(attributes, 'variable5'), charindex(attributes, ',')-1) variable5 

which I get same error message:

Invalid length parameter passed to the LEFT or SUBSTRING function.

I tried other codes using syntax substring & charindex, parsename etc that are already on internet, but they are not working. I assume it is because I am trying to split string into many columns & there are missing data to be considered.

Please help me!!!!

Thanks!!


Solution

  • This answer uses a SplitString function to split the input strings. This function was taken from this answer on the question Efficient query to split a delimited column into a separate table. Slightly modified for different separator (.,).

    CREATE TABLE #tt(id INT NOT NULL IDENTITY(1,1) PRIMARY KEY,col VARCHAR(4000));
    INSERT INTO #tt(col)VALUES
    ('variable1:A,variable2:B,variable3:C,variable4:D,variable9:I'),
    ('variable1:A,variable2:B,variable4:D,variable7:G'),
    ('variable8:H,variable9:I'),
    ('variable9:I');
    
    SELECT
        variable1=ISNULL(variable1,''),
        variable2=ISNULL(variable2,''),
        variable3=ISNULL(variable3,''),
        variable4=ISNULL(variable4,''),
        variable5=ISNULL(variable5,''),
        variable6=ISNULL(variable6,''),
        variable7=ISNULL(variable7,''),
        variable8=ISNULL(variable8,''),
        variable9=ISNULL(variable9,'')
    FROM
        (
            SELECT 
                id,
                v_col=LEFT(item,CHARINDEX(':',item)-1),
                v_val=SUBSTRING(item,CHARINDEX(':',item)+1,LEN(item))
            FROM 
                #tt 
                CROSS APPLY dbo.SplitStrings(col)
        ) AS s
        PIVOT (
            MAX(v_val) FOR
            v_col IN (variable1,variable2,variable3,variable4,variable5,variable6,variable7,variable8,variable9)
        ) AS p
    
    DROP TABLE #tt;
    

    Result:

    +-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+
    | variable1 | variable2 | variable3 | variable4 | variable5 | variable6 | variable7 | variable8 | variable9 |
    +-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+
    | A         | B         | C         | D         |           |           |           |           | I         |
    | A         | B         |           | D         |           |           | G         |           |           |
    |           |           |           |           |           |           |           | H         | I         |
    |           |           |           |           |           |           |           |           | I         |
    +-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+