Search code examples
sqlsql-serverstored-proceduresdatabase-normalizationdenormalization

comparable varchar "arrays" in seperate fields but on same row


I have a table that looks like this:

memberno(int)|member_mouth (varchar)|Inspected_Date (varchar)
-----------------------------------------------------------------------------
12           |'1;2;3;4;5;6;7'       |'12-01-01;12-02-02;12-03-03' [7 members]

So by looking at how this table has been structured (poorly yes)

The values in the member_mouth field is a string that is delimited by a ";"
The values in the Inspected_Date field is a string that is delimited by a ";"

So - for each delimited value in member_mouth there is an equal inspected_date value delimited inside the string

This table has about 4Mil records, we have an application written in C# that normalizes the data and stores it in a separate table. The problem now is because of the size of the table it takes a long time for this to process. (the example above is nothing compared to the actual table, it's much larger and has a couple of those string "array" fields)

My question is this: What would be the best and fastest way to normilize this data in MSSQL proc? let MSSQL do the work and not a C# app?


Solution

  • The best way will be SQL itself. The way followed in the below code is something which worked for me well with 2-3 lakhs of data.

    I am not sure about the below code when it comes to 4 Million, but may help.

    Declare @table table
    (memberno int, member_mouth varchar(100),Inspected_Date varchar(400))
    
    Insert into @table Values 
    (12,'1;2;3;4;5;6;7','12-01-01;12-02-02;12-03-03;12-04-04;12-05-05;12-07-07;12-08-08'),
    (14,'1','12-01-01'),
    (19,'1;5;8;9;10;11;19','12-01-01;12-02-02;12-03-03;12-04-04;12-07-07;12-10-10;12-12-12')
    
    Declare @tableDest table
    (memberno int, member_mouth varchar(100),Inspected_Date varchar(400))
    

    The table will be like.

    Select * from @table
    

    enter image description here

    See the code from here.

    ------------------------------------------
    Declare @max_len int,
            @count int = 1
    
    Set @max_len =  (Select max(Len(member_mouth) - len(Replace(member_mouth,';','')) + 1)
                    From    @table)
    
    While @count <= @max_len
    begin
        Insert  into @tableDest
        Select  memberno,
                SUBSTRING(member_mouth,1,charindex(';',member_mouth)-1),
                SUBSTRING(Inspected_Date,1,charindex(';',Inspected_Date)-1)
        from    @table
        Where   charindex(';',member_mouth) > 0
        union   
        Select  memberno,
                member_mouth,
                Inspected_Date
        from    @table
        Where   charindex(';',member_mouth) = 0
    
        Delete from @table
        Where   charindex(';',member_mouth) = 0
    
        Update  @table
        Set     member_mouth =  SUBSTRING(member_mouth,charindex(';',member_mouth)+1,len(member_mouth)),
                Inspected_Date = SUBSTRING(Inspected_Date,charindex(';',Inspected_Date)+1,len(Inspected_Date))
        Where   charindex(';',member_mouth) > 0
    
        Set     @count = @count + 1
    End
    ------------------------------------------
    Select  * 
    from    @tableDest
    Order   By memberno
    ------------------------------------------
    

    Result.

    enter image description here