Search code examples
excelpowerquerym

Powerquery: split multi-value cell to below empty cells


I am importing a lot of tables from pdf-files into Excel by Powerquery - which works pretty well.

Beside several other migrations I have the following task which I am not able to solve:

In some cases - esp. after page breaks - single values that should go into single cells (one below the other) are placed into one cell joined by linebreaks and below cells are empty.

sample of cell with multi-value and expected result

I need to split the values of such a cell (cell-content contains line-breaks) and put 2nd to n value into the according empty cells below this cell.

(It's kind of a "splitted drill-down" ...)

I am pretty new to M (not to VBA or programming) but I am not able to find a working solution.


Solution

  • This is difficult to do robustly but you can expand using Text.Split on the line feed delimiter as @horseyride suggests and remove the blank rows on that second column and then smash the columns back together with Table.FromColumns.

    Here's an example you can paste into the Advanced Editor:

    let
        Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMlTSUUoEYkMDpVgdCDcJiI0gXCMgMxkkawrnpsTkpcbkpYEEjeCCIJ4FMs8IImcMZKaDJI3h3AwQ11wpNhYA", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Week = _t, A = _t, B = _t]),
        TransformA = List.Select(List.Combine(List.Transform(Source[A], each Text.Split(_, "#(lf)"))), each Text.Length(_) > 0),
        FromCols = Table.FromColumns({Source[Week], TransformA, Source[B]}, {"Week", "A", "B"})
    in
        FromCols
    

    This takes a starting table like this: Start

    Transforms the A column as a list, splitting each element on the line feed character, combining each result back together, and filtering out null and empty strings:

    Transform

    The final step takes columns Week and B from the original table and sticks the transformed column A in the middle:

    Result

    You'll run into trouble if the number of extra expanded rows doesn't exactly match the number of blank rows removed but this should work under the assumption that they do match.