Search code examples
pythonexcelpowerquery

Convert a single column with a list of values into one hot encoding using Power Query


For now I have following table in Excel, which makes it inconvenient to use for some further processing.

ID Segments
1 Food
2 Automation
3 Mechatronics
4 Automation;Mechatronics

What I would like to achieve, is one hot encoding of this data, which should look as follows. I'm trying to use Power Query for that.

ID Food Automation Mechatronics
1 1 0 0
2 0 1 0
3 0 0 1
4 0 1 1

I have found similar question but with a simpler example, where we always have just a single value in a column. I have tried suggested approach but modified it a bit, by splitting my column by ; into multiple ones and then trying to pivot for every one of a splatted columns. Unfortunately it's resulting in an error as we are trying to create duplicated columns that way.

There is at least 30 distinct segments I may have. It seems that there are at most 5 of those combined for single record. Adding this note so that it's easier to find a balance between automation and manual work.

In worst case when there is no simple solution using Power Query, I can do this operation using Python (don't know how to do it in this complex case too thought) and reimport it later on.


Solution

  • In Power Query, Split Column by Delimiter, there is an Advanced option to Split into Rows.

    enter image description here

    Add a custom column of just the number 1, then pivot

    enter image description here

    enter image description here