Search code examples
pandasgrid-searchsklearn-pandas

how to use gridSearch CV with scipy?


i have been trying to tune my SVM using Gridsearchcv but it is throwing errors.

my code is :

train = pd.read_csv('train_set.csv')
label = pd.read.csv('lebel.csv')

params = { 'C' : [ 0.01 , 0.1 , 1 , 10]
clf = GridSearchCV(SVC() , params , n_jobs = -1)
clf.fit(train , label)

throws the error as : 'too much indices for array'

but when i simply do this :

clf = svc()
clf.fit(train.data , label.data)

the code works fine


Solution

  • I suspect the problem lies with your data structure train.data / label.data. I have tested both versions of your code and they work:

    import sklearn.svm as sksvm
    import sklearn.grid_search as skgs
    
    params = { 'C' : [ 0.01 , 0.1 , 1 , 10]}
    X = np.random.rand(1000, 10)  # (1000 x 10) matrix, 1000 points with 10 features
    Y = np.random.randint(0, 2, 1000)  # 1000 array, binary labels
    
    mod = sksvm.SVC()
    mod.fit(X, Y)
    

    Output:

    SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0, degree=3, gamma=0.0,
      kernel='rbf', max_iter=-1, probability=False, random_state=None,
      shrinking=True, tol=0.001, verbose=False)
    

    and

    import sklearn.svm as sksvm
    import sklearn.grid_search as skgs
    
    params = { 'C' : [ 0.01 , 0.1 , 1 , 10]}
    X = np.random.rand(1000, 10)  # (1000 x 10) matrix, 1000 points with 10 features
    Y = np.random.randint(0, 2, 1000)  # 1000 array, binary labels
    
    mod = skgs.GridSearchCV(sksvm.SVC(), params, n_jobs=-1)
    mod.fit(X, Y)
    

    Output:

    GridSearchCV(cv=None, error_score='raise',
           estimator=SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0, degree=3, gamma=0.0,
      kernel='rbf', max_iter=-1, probability=False, random_state=None,
      shrinking=True, tol=0.001, verbose=False),
           fit_params={}, iid=True, loss_func=None, n_jobs=-1,
           param_grid={'C': [0.01, 0.1, 1, 10]}, pre_dispatch='2*n_jobs',
           refit=True, score_func=None, scoring=None, verbose=0)
    

    If your data is in dataframe and series the code still works, you can try it by adding:

    X = pd.DataFrame(X)
    Y = pd.Series(Y)
    

    after you generate X and Y.

    Difficult to say without a reproducible piece of code though. Also you probably should add the label sklearn to the question.