Search code examples
machine-learningscikit-learntrain-test-split

sklearn train_test_split confusion


I am getting an error running a code. What could be the possible error?

X = [['Item_Identifier', 'Item_Weight', 'Item_Fat_Content', 'Item_Visibility',
       'Item_Type', 'Item_MRP', 'Outlet_Identifier',
       'Outlet_Establishment_Year', 'Outlet_Size', 'Outlet_Location_Type',
       'Outlet_Type']]
y = ['Item_Outlet_Sales']
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.80,test_size=0.20)

Error which I am getting :

ValueError: With n_samples=1, test_size=0.2 and train_size=0.8, the resulting train set will be empty. Adjust any of the aforementioned parameters.

Solution

  • I believe you are dealing with a pandas data frame, and your X, y as indicated by the lines :

    X = [['Item_Identifier', 'Item_Weight', 'Item_Fat_Content', 'Item_Visibility', 'Item_Type', 'Item_MRP', 'Outlet_Identifier', 'Outlet_Establishment_Year', 'Outlet_Size', 'Outlet_Location_Type', 'Outlet_Type']]
    y = ['Item_Outlet_Sales']
    

    are only lists which are the indices of the real column names, this does not contain valid training data. If your data frame name is df, created by the lines, df = Dataframe(your_data), you should try :

    X = df[['Item_Identifier', 'Item_Weight', 'Item_Fat_Content', 'Item_Visibility', 'Item_Type', 'Item_MRP', 'Outlet_Identifier', 'Outlet_Establishment_Year', 'Outlet_Size', 'Outlet_Location_Type', 'Outlet_Type']]
    y = df[['Item_Outlet_Sales']]
    

    For extracting the X, y data.