Search code examples
pythonscikit-learnsvc

Does LinearSVC take in qualitative data?


I am trying to predict whether a song is played using the open version or not using python, ski kit learn and the LinearSVC method.

My input data:

enter image description here

I already encoded the product column as 1s and 0s (1 if open 0 if not).

Things like context will have an impact on the product type. I was wondering if I need to make all of the categorical variables numerical for LinearSVC to handle them.


Solution

  • In general, turning categorical features into continuous features is a sub-optimal solution.

    When using a support vector machine as a classifier (or even logistic regression), there should be no issue with handling categorical features that are 0-1 encoded. In cases where you have categorical features that cannot be converted to binary (e.g., your "context" column), I would recommend one-hot-encoding the data (see here first.

    There might be a problem if there are too many unique entries for a particular feature. In that case, the one-hot-encoding will produce as many features as there are unique entries, which could be computationally expensive.