For now I have following table in Excel, which makes it inconvenient to use for some further processing.
ID | Segments |
---|---|
1 | Food |
2 | Automation |
3 | Mechatronics |
4 | Automation;Mechatronics |
What I would like to achieve, is one hot encoding of this data, which should look as follows. I'm trying to use Power Query for that.
ID | Food | Automation | Mechatronics |
---|---|---|---|
1 | 1 | 0 | 0 |
2 | 0 | 1 | 0 |
3 | 0 | 0 | 1 |
4 | 0 | 1 | 1 |
I have found similar question but with a simpler example, where we always have just a single value in a column. I have tried suggested approach but modified it a bit, by splitting my column by ;
into multiple ones and then trying to pivot for every one of a splatted columns. Unfortunately it's resulting in an error as we are trying to create duplicated columns that way.
There is at least 30 distinct segments I may have. It seems that there are at most 5 of those combined for single record. Adding this note so that it's easier to find a balance between automation and manual work.
In worst case when there is no simple solution using Power Query, I can do this operation using Python (don't know how to do it in this complex case too thought) and reimport it later on.
In Power Query, Split Column by Delimiter, there is an Advanced option to Split into Rows.
Add a custom column of just the number 1, then pivot