Search code examples
pythonpandasscikit-learnregressionprobability

sklearn deterministic regression with multiple tags


I have a regression problem in python. My input dataset looks like this:

x= Means, deviations, variances, varianceOfVariance

y = walk, slow, run, hold

The X features consists of values, and the Y is binary tagged in one of 4 categories. So it can be or walk or slow or run or hold. data.head() looks like this.

enter image description here

I am able to split the pd dataframe in X_train, X_test, y_train, y_test with the train_test_split() method.

I want to make a regressor (ex SVM, or linear regressor) that gives predictions for these tags in a format like this: 70% walk, 25% slow, 0% run, 5% hold.

It has to be probabilistic, I tried with a classifier, and combined the tags into one variable but now i'm trying with probabilistic chances.

Is this possible with the sklearn library, if yes how? I can't figure it out.


Solution

  • The classifiers are already probabilistic - you can get the probabilities and not just the class of highest prob. using predict_proba :

    clf.fit(X_train, y_train)
    score = clf.score(X_test, y_test)
    probs = clf.predict_proba(X_test)