I'm having difficulties mining a big (100K entries) dataset of mine concerning logistics transportation. I have around 10 nominal String attributes (i.e. city/region/country names, customers/vessel identification codes, etc.). Along with those, I have one date attribute "departure" and one ratio-scaled numeric attribute "goal".
What I'm trying to do is using a training set to find out which attributes have strong correlations with "goal" and then validating these patterns by predicting the "goal" value of entries in a test set.
I assume clustering, classification and neural networks could be useful for this problem, so I used RapidMiner, Knime and elki and tried to apply some of their tools on my data. However, most of these tools only handle numeric data, so I got no useful results.
Is it possible to transform my nominal attributes into numeric ones? Or do I need to find different algorithms that can actually handle nominal data?
you most likely want to use tree based algorithm. These are good to use nominal features. Please be aware, that you do not want to use "id-like" attributes.
I would recommend RapidMiner's AutoModel feature as a start. GBT and RandomForest should work well.
Best, Martin