I have a number of columns in my table which have text values that fall into categories - e.g. column "ABC" has 9000 rows but every row must have a value in the set {"A","B","C"}. Other columns like Gender have "M"/"F"/null
For each column, I'd like to convert it into an integer list in-place - so A:1, B:2, C:3 etc.
I've been trying out using List.Distinct
to extract the values to a temp table, adding an index column to that and using a join to transform the initial column based on that mapping in the temp table. However this seems slow and I'm not sure how to run this over all columns in my table (or at least Table.ColumnsOfType(Source, {type nullable text})
to select the categorical columns...).
Any suggestions?
Gender | Fruit | [...] |
---|---|---|
F | Cat | |
F | Dog | |
M | Lemon | |
M | Dog | |
M | Lemon | |
null | Cat | |
M | Dog |
Gender | Fruit | [...] |
---|---|---|
1 | 1 | |
1 | 2 | |
2 | 3 | |
2 | 2 | |
2 | 3 | |
null | 1 | |
2 | 2 |
In PowerQuery, this seems to work for any number of columns
Replace all nulls with something else, here +=+
Add Index
Unpivot
Remove duplicates
Group, add index to each group
Merge back into original and expand
Repivot
Remove extra columns
Before and After:
Full code:
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Replaced Value" = Table.ReplaceValue(Source,null,"+=+",Replacer.ReplaceValue,Table.ColumnNames(Source)),
#"Added Index" = Table.AddIndexColumn(#"Replaced Value", "Index", 0, 1),
#"Unpivoted Other Columns" = Table.UnpivotOtherColumns(#"Added Index", {"Index"}, "Attribute", "Value"),
// derive a table of replacements
#"Removed Duplicates" = Table.Distinct(#"Unpivoted Other Columns", {"Attribute", "Value"}),
#"Grouped Rows" = Table.Group(#"Removed Duplicates", {"Attribute"}, {{"GRP", each Table.AddIndexColumn(_, "Index2", 1, 1), type table}}),
#"Expanded GRP" = Table.ExpandTableColumn(#"Grouped Rows", "GRP", {"Value", "Index2"}, {"Value", "Index2"}),
//replace originals
#"Merged Queries" = Table.NestedJoin(#"Unpivoted Other Columns",{"Attribute", "Value"},#"Expanded GRP",{"Attribute", "Value"},"EG",JoinKind.LeftOuter),
#"Expanded Table1" = Table.ExpandTableColumn(#"Merged Queries", "EG", {"Index2"}, {"Index2"}),
#"Removed Columns" = Table.RemoveColumns(#"Expanded Table1",{"Value"}),
#"Pivoted Column" = Table.Pivot(#"Removed Columns", List.Distinct(#"Removed Columns"[Attribute]), "Attribute", "Index2", List.Sum),
#"Removed Columns1" = Table.RemoveColumns(#"Pivoted Column",{"Index"})
in #"Removed Columns1"