Search code examples
powerbipowerquerym

Transform categorical column into dummy columns using Power Query M


Using Power Query "M" language, how would you transform a categorical column containing discrete values into multiple "dummy" columns? I come from the Python world and there are several ways to do this but one way would be below:

>>> import pandas as pd
>>> dataset = pd.DataFrame(list('ABCDACDEAABADDA'),
               columns=['my_col'])
>>> dataset
   my_col
0       A
1       B
2       C
3       D
4       A
5       C
6       D
7       E
8       A
9       A
10      B
11      A
12      D
13      D
14      A
>>> pd.get_dummies(dataset)
    my_col_A  my_col_B  my_col_C  my_col_D  my_col_E
0          1         0         0         0         0
1          0         1         0         0         0
2          0         0         1         0         0
3          0         0         0         1         0
4          1         0         0         0         0
5          0         0         1         0         0
6          0         0         0         1         0
7          0         0         0         0         1
8          1         0         0         0         0
9          1         0         0         0         0
10         0         1         0         0         0
11         1         0         0         0         0
12         0         0         0         1         0
13         0         0         0         1         0
14         1         0         0         0         0

Solution

  • Interesting question. Here's an easy, scalable method I've found:


    1. Create a custom column of all ones (Add Column > Custom Column > Formula = 1).

    2. Add an index column (Add Column > Index Column).

    3. Pivot on the custom column (select my_col > Transform > Pivot Column).

    4. Replace null values with 0 (select all columns > Transform > Replace Values).


    Here's what the M code looks like for this process:

        #"Added Custom" = Table.AddColumn(#"Previous Step", "Custom", each 1),
        #"Added Index" = Table.AddIndexColumn(#"Added Custom", "Index", 0, 1),
        #"Pivoted Column" = Table.Pivot(#"Added Index", List.Distinct(#"Added Index"[my_col]), "my_col", "Custom"),
        #"Replaced Value" = Table.ReplaceValue(#"Pivoted Column",null,0,Replacer.ReplaceValue,Table.ColumnNames(#"Pivoted Column"))
    

    Once you've completed the above, you can remove the index column if desired.