python python-3.x pandas scikit-learn sklearn-pandas

can we predict two columns of a dataframe at same time?

Here is a sample.csv file in which i have 3 columns of int-type data.
It is working fine with predicting one column of data
but showing error while predicting two columns col2 and col3.

col1,col2,col3
1,5,1
3,6,5
8,5,2
6,4,2
6,9,5

import pandas as pd
data = pd.read_csv('sample.csv')
input = data
objective = data[["col2","col3"]]
xtr,xtst,ytr,ytst = train_test_split(input,objective,test_size=0.25,
                                       train_size=0.75,random_state=4)
from sklearn.svm import SVR
classifier = SVR()
classifier.fit(xtr,ytr)


 Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\User\Anaconda3\lib\site-packages\sklearn\svm\base.py", line 149, in fit
    X, y = check_X_y(X, y, dtype=np.float64, order='C', accept_sparse='csr')
  File "C:\Users\User\Anaconda3\lib\site-packages\sklearn\utils\validation.py", line 547, in check_X_y
    y = column_or_1d(y, warn=True)
  File "C:\Users\User\Anaconda3\lib\site-packages\sklearn\utils\validation.py", line 583, in column_or_1d
    raise ValueError("bad input shape {0}".format(shape))
ValueError: bad input shape (3, 2)

Solution

If you are willing to work with any other regressors than Support vector machines, then please look here:

http://scikit-learn.org/stable/modules/multiclass.html

Here check the classifiers which are inherently multiclass and try for their corresponding regressos estimators. For example, DecisionTreeClassifier is mentioned here, so DecisionTreeRegressos will be supporting multiple outputs too. Why I am talking about inherently mutliclass is that they will be able to use the correlations between the output values to get better learning.

If you want to use SVR, then you can use MultiOutputRegressor. See the example here:-

from sklearn.datasets import make_regression
from sklearn.multioutput import MultiOutputRegressor
from sklearn.svm import SVR

classifier = MultiOutputRegressor(SVR())
classifier.fit(xtr,ytr)

Keep in mind that it will just make the code easy for you but will still fit only one output at a time internally. So here in this case, it will internally fit two svm's (one for each output) and may not be able to use the correlation between the outputs.