Hi I am trying to do one hot encoding in Orange in order to conduct market basket analysis.
Currently I have transaction data as follows in my CSV:
C# | Items | ||
---|---|---|---|
C1 | Apple | Orange | |
C2 | Baby Milk | Apple | Orange |
I would like to find out what are the steps that I can do to process the data in orange or other software such that I am able to get this state for my data
C# | Apple | Orange | Baby Milk |
---|---|---|---|
C1 | 1 | 1 | 0 |
C2 | 1 | 1 | 1 |
Currently when I try to preprocess the data in orange using "continous discrete variables - one feature per line" I get individual feature value columns.
It is not entirely straightforward, but you could concatenate your products with comma or semicolon, pass it to Corpus, apply tokenization based on your concatenation character (comma, semicolon) with a Regex, then use Bag of Words from the Text add-on. I have tried it with Associate add-on, and it seems to work.