Search code examples
pythonscikit-learntrain-test-split

How to split a tuple using train_test_split?


X = (569,30)
y = (569,)
X_train, X_test, y_train, y_test = train_test_split(np.asarray(X),np.asarray(y),test_size = 0.25, random_state=0)

I am expecting output as below:

  • X_train has shape (426, 30)
  • X_test has shape (143, 30)
  • y_train has shape (426,)
  • y_test has shape (143,)

But i am getting the following warning

ValueError: Found input variables with inconsistent numbers of samples: [2, 1]

I know that, i can get the desired output in another way, all the problems found in the online show that lengths of X and y are not same but in my case that's not the problem.


Solution

  • It seems that you're misunderstanding what train_test_split does. It is not expecting the shapes of the input arrays, what it does is to split the input arrays into train and test sets. So you must feed it the actual arrays, for instace:

    X = np.random.rand(569,30)
    y =  np.random.randint(0,2,(569))
    X_train, X_test, y_train, y_test = train_test_split(np.asarray(X),np.asarray(y),test_size = 0.25, random_state=0)
    

    print(X_train.shape)
    print(X_test.shape)
    print(y_train.shape)
    print(y_test.shape)
    
    (426, 30)
    (143, 30)
    (426,)
    (143,)