I am trying to use logistic regression model in MLBase to predict CTR of Ad. In my dataset I have some category variables and I want to transform them to dummy/indicator variables used as input of model. My data looks like
"log_time","country","gender"
"2015-05-19","USA","M"
"2015-05-20","IND","F"
Are there some solution to complete the transformation in MLBase or scala?
What you're looking for is called one hot encoding.
Spark's MLlib has a one hot encoder which can do this for you.