Search code examples
pythonpython-3.xpandasscikit-learnsklearn-pandas

can we predict two columns of a dataframe at same time?


Here is a sample.csv file in which i have 3 columns of int-type data.
It is working fine with predicting one column of data
but showing error while predicting two columns col2 and col3.

col1,col2,col3
1,5,1
3,6,5
8,5,2
6,4,2
6,9,5

import pandas as pd
data = pd.read_csv('sample.csv')
input = data
objective = data[["col2","col3"]]
xtr,xtst,ytr,ytst = train_test_split(input,objective,test_size=0.25,
                                       train_size=0.75,random_state=4)
from sklearn.svm import SVR
classifier = SVR()
classifier.fit(xtr,ytr)


 Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\User\Anaconda3\lib\site-packages\sklearn\svm\base.py", line 149, in fit
    X, y = check_X_y(X, y, dtype=np.float64, order='C', accept_sparse='csr')
  File "C:\Users\User\Anaconda3\lib\site-packages\sklearn\utils\validation.py", line 547, in check_X_y
    y = column_or_1d(y, warn=True)
  File "C:\Users\User\Anaconda3\lib\site-packages\sklearn\utils\validation.py", line 583, in column_or_1d
    raise ValueError("bad input shape {0}".format(shape))
ValueError: bad input shape (3, 2) 

Solution

  • If you are willing to work with any other regressors than Support vector machines, then please look here:

    Here check the classifiers which are inherently multiclass and try for their corresponding regressos estimators. For example, DecisionTreeClassifier is mentioned here, so DecisionTreeRegressos will be supporting multiple outputs too. Why I am talking about inherently mutliclass is that they will be able to use the correlations between the output values to get better learning.

    If you want to use SVR, then you can use MultiOutputRegressor. See the example here:-

    from sklearn.datasets import make_regression
    from sklearn.multioutput import MultiOutputRegressor
    from sklearn.svm import SVR
    
    classifier = MultiOutputRegressor(SVR())
    classifier.fit(xtr,ytr)
    

    Keep in mind that it will just make the code easy for you but will still fit only one output at a time internally. So here in this case, it will internally fit two svm's (one for each output) and may not be able to use the correlation between the outputs.