Search code examples
excelsyntaxpowerquerydata-cleaningm

Writing a query that can have excel fill in and consolidate unpopulated fields within a group of duplicates?


I have a contact info dataset (large) that contains lots of semi-duplicate rows that I'd like to condense into as little rows as possible. Attached is a sample of what I'm talking about.

data sample

The blue table on the left is smaller scale example of what I'm currently working with. The orange table on the right is what I would like the table to look like.

I want to write a query that will be able to select an ID that has multiple rows, and within that selection, assess whether values can be moved into a parent row that has unpopulated cells (see ID "4" and how I condensed those three rows of data into one by filling in blanks and consolidating duplicates).

An important point of emphasis is how to perform this task without it being a blanket statement for all duplicates in the entire worksheet. Ultimately I want to perform this task for the entire worksheet, but I want excel to first isolate a single ID and then execute the aforementioned task, rather than evaluating the criteria based on all duplicate IDs. ((If that makes sense))

One other condition I would like to have is for certain columns where multiple rows under the same ID have different values, is to allocate that data into a subsequent column (see Tags & Tags2 columns under ID "1") instead of overriding a cell.

  • I only want to do this ^ for 2 columns; for the others, have it keep them as separate rows.

This sounds like a task for Power Query, but my knowledge is limited in that realm. Any help on how to construct a query that accomplish this task is much appreciated. Thanks.


Solution

  • This seems to work fine

    let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"ID", Int64.Type}, {"Title", type text}, {"Company", type text}, {"Phone", type text}, {"Phone2", type any}, {"Street Address", type any}, {"City", type text}, {"Tags", type text}}),
    // group, then unpivot, remove duplicates
    #"Grouped Rows" = Table.Group(#"Changed Type", {"ID"}, {{"Data", each Table.Distinct(Table.UnpivotOtherColumns(_, {"ID"}, "Attribute", "Value"), {"Attribute", "Value"}), type table}}),
    // combine all the tags into one cell for later splitting
    #"Added Custom" = Table.AddColumn(#"Grouped Rows", "Custom", each Table.Group([Data], {"ID", "Attribute"}, {{"Data", each Text.Combine([Value],","), type text}})),
    #"Expanded Custom" = Table.ExpandTableColumn(#"Added Custom", "Custom", {"Attribute", "Data"}, {"Attribute", "Data.1"}),
    // replace null with Title to preserve rows with no data
    #"Replaced Value" = Table.ReplaceValue(#"Expanded Custom",null,"Title",Replacer.ReplaceValue,{"Attribute"}),
    #"Removed Columns" = Table.RemoveColumns(#"Replaced Value",{"Data"}),
    #"Pivoted Column" = Table.Pivot(#"Removed Columns", List.Distinct(#"Removed Columns"[Attribute]), "Attribute", "Data.1"),
    // split the Tags column into any number of columns as needed
    #"Replaced Value1" = Table.ReplaceValue(#"Pivoted Column",null,"xxx",Replacer.ReplaceValue,{"Tags"}),
    DynamicColumnList  = List.Transform({1 ..List.Max(Table.AddColumn(#"Replaced Value1","Custom", each List.Count(Text.PositionOfAny([Tags],{","},Occurrence.All)))[Custom])+1}, each "Tags." & Text.From(_)),
    #"Split Column by Delimiter" =  Table.SplitColumn(   #"Pivoted Column", "Tags",  Splitter.SplitTextByDelimiter(",", QuoteStyle.Csv), DynamicColumnList)
    in  #"Split Column by Delimiter"