Search code examples
excelcsvpowerquery

Excel - use sublist delimiter power query without duplicating data


I am unable to transform my CSV in the way that I want. I have looked online, but have been unable to find a resolution to my problem.

I have the following CSV:

name;number;list
"name1";1234;"1. item1
2. item2
3. item3
";"name2";4567;"1. item4
2. item5
3. item6
"

I want to transform that into the below table:

name number list
"name1" 1234 item1
item2
item3
"name2" 4567 item4
item 5
item 6

Except when I use the power query delimiter it gives me a lot of duplicate data, like so:

name number list
"name1" 1234 item1
"name1" 1234 item2
"name1" 1234 item3
"name2" 4567 item4
"name2" 4567 item 5
"name2" 4567 item 6

Solution

  • Given your source CSV (which is pretty badly formed):

    name;number;list
    "name1";1234;"1. item1
    2. item2
    3. item3
    ";"name2";4567;"1. item4
    2. item5
    3. item6
    "
    

    Opening in Power Query, with the semicolon delimiter, results in =>
    enter image description here

    Read the code comments and check each Applied Step for the steps involved in transforming it to your pictured result:

    let
        Source = Csv.Document(File.Contents("C:\Users\ron\Desktop\New Text Document.csv"),
            [Delimiter=";", Columns=3, Encoding=1252, QuoteStyle=QuoteStyle.None]),
        #"Promoted Headers" = Table.PromoteHeaders(Source, [PromoteAllScalars=true]),
        #"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers",{{"name", type text}, {"number", Int64.Type}, {"list", type text}}),
        
        #"Removed Blank Rows" = Table.SelectRows(#"Changed Type", 
            each not List.IsEmpty(List.RemoveMatchingItems(Record.FieldValues(_), {"", null}))),
    
    //Replace blanks in list with "name", then delete "name"
        #"Replace Blank list items" = Table.ReplaceValue(
            #"Removed Blank Rows",
            each [list],
            each [name],
            (x,y,z)=> if x = "" or x = null then z else x,
            {"list"}),
    
    //Remove items and trim "name" column
        #"Remove list items" = Table.ReplaceValue(
            #"Replace Blank list items",
            each [name],
            each [list],
            (x,y,z)=> if y = z then null else x,
            {"name"}),
    
        #"Quote name" = Table.TransformColumns(#"Remove list items",{
            {"name", each """" & Text.Trim(_,{";",""""}) & """", type text}}),
    
        #"Remove Chars from list" = Table.TransformColumns(#"Quote name",{
            {"list", each Text.TrimStart(_,{"0".."9","."," "}), type text}})
    in
        #"Remove Chars from list"
    

    enter image description here