python numpy scikit-learn sklearn-pandas

ValueError: Found input variables with inconsistent numbers of samples: [1454711, 0]

Just trying to train my model but i am stuck at this error.

#Splitting the data
X_train, X_test, y_train, y_test = train_test_split(tweets,labels, random_state=0)
print (len(X_train),len(X_test),len(y_train),len(y_test))

which produces:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-26-b1b63fa90541> in <module>()
      1 #Splitting the data
----> 2 X_train, X_test, y_train, y_test = train_test_split(tweets,labels, random_state=0)
      3 print (len(X_train),len(X_test),len(y_train),len(y_test))

2 frames
/usr/local/lib/python3.7/dist-packages/sklearn/utils/validation.py in check_consistent_length(*arrays)
    210     if len(uniques) > 1:
    211         raise ValueError("Found input variables with inconsistent numbers of"
--> 212                          " samples: %r" % [int(l) for l in lengths])
    213 
    214 

ValueError: Found input variables with inconsistent numbers of samples: [1454711, 0]

I'm not getting how to fix it. Looking forward for suggestions to resolve this error.

Solution

According to this answer over at DataScience, as well as the given error values of "[1454711, 0]", your "tweets" and "labels" are not of the same size or length, which train_test_split() requires.

So check out both that answer and double-check that your "labels" data is correct.