Search code examples
sqlexcelms-access

Merging quasi-duplicate rows by moving values to new columns


I have a table with thousands of rows that looks somewhat like this:

name sub_text sub_code other_data
a text1 code1 a_data
a text2 code2 a_data
a text3 code3 a_data
a text4 code4 a_data
b text1 code1 b_data
b text2 code2 b_data
c c_data

These records represent people and some of them have 4 or 2 rows where only the sub_text and the sub_code values differ. There are tens of other_data columns (person's ID, birth date and place, etc) that are the same for those 4 or 2 rows. There are also rows that don't have any duplicates in which case the sub_text and sub_code values are empty.

I would like to convert this table into something like this:

name sub_text1 sub_text2 sub_text3 sub_text4 sub_code1 sub_code2 sub_code3 sub_code4 other_data
a text1 text2 text3 text4 code1 code2 code3 code4 a_data
b text1 text2 code1 b_data
c c_data

I need to sort of "merge" the records so one person only has one record by adding the only differing values to new columns, while leaving the other_data fields and the people who have only one record untouched.

This is originally an Excel table and I would also need the "converted" table in Excel. I don't think Excel can do something like that so I created a database (I only have it in Access now) hoping I could do more with SQL. I can't figure out how to solve this though. Any ideas?


Solution

  • In Power Query, available in Windows Excel 2010+ and Office 365, you can do this fairly easily by

    • Group by the duplicated columns (name and other_data in your example)
    • Then extract the combined subtable code and text columns into a delimited list
    • Split that list into new columns

    To use Power Query

    • Select some cell in your Data Table
    • Data => Get&Transform => from Table/Range
    • When the PQ Editor opens: Home => Advanced Editor
    • Make note of the Table Name in Line 2
    • Paste the M Code below in place of what you see
    • Change the Table name in line 2 back to what was generated originally.
    • Read the comments and explore the Applied Steps to understand the algorithm

    M-Code

    let
        Source = Excel.CurrentWorkbook(){[Name="Table10"]}[Content],
    
    //Group by Name and Other Data
    //  extract a delimited list of the sub_text and sub_codes
        #"Grouped Rows" = Table.Group(Source, {"name", "other_data"}, {
            {"sub_text", each Text.Combine([sub_text],";"), type text},
            {"sub_code", each Text.Combine([sub_code],";"), type text}    
            }),
    
    //split the delimited lists into separate columns.
        #"Split Column by Delimiter" = Table.SplitColumn(#"Grouped Rows", "sub_text", 
                    Splitter.SplitTextByDelimiter(";", QuoteStyle.Csv)),
        #"Split Column by Delimiter1" = Table.SplitColumn(#"Split Column by Delimiter", "sub_code", 
                    Splitter.SplitTextByDelimiter(";", QuoteStyle.Csv))
    in
        #"Split Column by Delimiter1"
    

    enter image description here