Search code examples
excelpowerquerycombinationsm

Calculate combinations for a range of text values in excel


I do not know enough about M in PowerQuery to work out how to solve the below.

The example is, I have an excel range of over 300 rows, each with 12 columns that may contain strings of meta-data (these are single or two word strings) about an original information source. Many of the 12 columns have null values as not all rows have 12 strings of meta-data. I have included a fake sample of the data to be clear what I'm talking about.

ID Input column 1 Input column 2 Input column 3 Input column 4
1 A B D C
2 G E F
3 A F

Note: I have deliberately mixed the order of alphabet characters to show that the order does not matter (not a permutation algorithm).

I want like to be able to count the combinations of all column strings in all rows without counting a single column string as it is not a combination, or repeating any column string in a row's possible combinations. So, although order does not matter, repeating any combination must be excluded for the combination to contribute to the counting algorithm. I have tried looking at options on stackoverflow and they don't quite seem to work as they often make repeat combinations of strings.

So if we take the calculation for row ID=1 only, then the desired output would be,

ID Output column 1 Output column 2 Output column 3 Output column 4
1 A B
2 A C
3 A D
4 B C
5 B D
6 C D
7 A B C
8 A B D
9 A C D
10 B C D
11 A B C D

I think that's all of them. Let me know if I missed a unique combination! Solutions sorted alphabetically within a row (see the above table) is preferable to tidy up the results.

The following table is an example of what I don't want. The value of A is being combined with itself as output in both rows ID1 and ID2. Row ID3 is also not permitted, as it is not counting a combination, it is counting a unique value that has not been combined.

ID Output column 1 Output column 2 Output column 3
1 A A
2 A C A
3 A

Solutions in powerquery would be handy as the excel solutions are pretty complicated it seems - I could be wrong though, as I don't know how to solve it.


Solution

  • To implement @rachel algorithm in Power Query, given your data:

    let
        Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
        #"Changed Type" = Table.TransformColumnTypes(Source,{{"ID", Int64.Type}, 
            {"Input column 1", type text}, {"Input column 2", type text}, 
            {"Input column 3", type text}, {"Input column 4", type text}}),
        #"Added Custom" = Table.AddColumn(#"Changed Type", "Combination Count", 
            each 
                let 
                    n = List.Count(List.RemoveNulls(List.Skip(Record.FieldValues(_))))
                in 
                    Number.Power(2,n) - n -1,Int64.Type)
    in
        #"Added Custom"
    

    enter image description here

    Algorithm for selecting the relevant columns In the Added Custom step:

    • Record.FieldValues(_) returns each value in that row in a List
    • List.Skip then skips the first entry (the ID column)

    If the above algorithm doesn't return the relevant columns in your actual data, there are other ways of doing this, either hard-coded or a different algorithm, depending on the nature of your actual data.

    As pointed out by @SamNseir, should any of the cells in the row have identical entries, adding List.Distinct to the algorithm in #"Added Custom" would be necessary:

    #"Added Custom" = Table.AddColumn(#"Changed Type", "Combination Count", 
            each 
                let 
                    n = List.Count(
                            List.RemoveNulls(
                                List.Distinct(
                                    List.Skip(Record.FieldValues(_)))))
                in 
                    Number.Power(2,n) - n -1,Int64.Type)
    

    Edit
    If you want to have both the output columns and the Count as an output, it can be a little tricky as Power Query can only output a single table.

    One way to do this in a single table is as shown below, where the input rows are repeated for each row in the output table (along with the Count and ID columns.

    I am using a custom function to generate the combinations. So first add the code below as a blank query and name it fnCombos

    //https://community.powerbi.com/t5/Community-Blog/Combinatorics-in-Power-Query-part1/bc-p/2221143/highlight/true#M3151
    //  See answer of Alexis Olson
    
    //fnCombos
    //  combinations of all entries in a single list/column
    
    (L as list) as list =>
    
    let
        N = List.Count(L),
        Subsets =
            List.Transform(
                {0..Number.Power(2, N)-1},
                (i) => List.Transform(
                    {0..N-1},
                    (j) => if Number.Mod(Number.IntegerDivide(i, Number.Power(2, j)), 2) = 1
                        then L{j}
                        else null
                        )
            ),
      Concatenate = List.Transform(Subsets, each Text.Combine(List.RemoveNulls(_), ","))
    in   
        Concatenate
    

    Then use the following for your main code:

    • Be sure to read the code comments for important information
    • Should work with a variable number of input columns, provided they all start with the word Input. If they don't, then we will need some other method of selection.
    let
        Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
        inputCols = List.Select(Table.ColumnNames(Source), each Text.StartsWith(_,"Input")),
        #"Changed Type" = Table.TransformColumnTypes(Source, {{"ID", Int64.Type}} &
             List.Transform(inputCols, each {_, type text})),
        #"Added Counts" = Table.AddColumn(#"Changed Type", "Combination Count", 
            each 
                let 
                    n = List.Count(
                            List.RemoveNulls(
                                List.Distinct(
                                    Record.FieldValues(Record.SelectFields(_,inputCols)))))
                in 
                    Number.Power(2,n) - n -1,Int64.Type),
        #"Added Combos" = Table.AddColumn(#"Added Counts", "Combinations", (r)=>
            let 
                //Select the Input columns only
                //Add List.Distinct if you might have duplicate strings in any of the input columns
                vals = List.RemoveNulls(
                            List.Distinct(Record.FieldValues(
                                Record.SelectFields(r, inputCols)))),                               
    
                //combos that meet your specifications will all contain a comma.
                combos = List.Select(fnCombos(List.Sort(vals)), each Text.Contains(_,","))
    
            in 
                Table.FromList(combos,Splitter.SplitTextByDelimiter(","), List.Count(inputCols))),
    
        outCols = Table.ColumnNames(#"Added Combos"{0}[Combinations]),
        outCols2 = List.Transform(outCols, each "Output " & _),
        #"Expanded Combinations" = Table.ExpandTableColumn(#"Added Combos", "Combinations",
            outCols,outCols2),
       #"Changed Type1" = Table.TransformColumnTypes(#"Expanded Combinations", List.Transform(outCols2, each {_, type text}))
    in
        #"Changed Type1"
    

    Results
    enter image description here