Search code examples
pythonscikit-learndelete-rowindicestrain-test-split

Python / How to delete specific rows in testing data with indices after / train / test / split


I want to delete in X_test and in y_test every row where MFD is bigger one. The problem is, that i always get the random mixed indices from Train / Test / Split. If i try to drop it i get the following Error Message:

IndexError: index 3779 is out of bounds for axis 1 with size 3488

I cant use the old indices to drop it, but how can i get the new ones where MFD > 1

  X_train, X_test, y_train, y_test = train_test_split(X, y, 
                                                test_size=test_size, 
                                                random_state=random_state, 
                                                stratify=y)



mfd_drop_rows = []
i_nr = 0
for i in X_test.MFD:
   if (i > 1): 
      mfd_drop_rows.append(X_test.index[i_nr])
   i_nr += 1


X_test_new = X_test.drop(X_test.index[mfd_drop_rows]) 
y_test_new = Y_test.drop(Y_test.index[mfd_drop_rows]) 

Thanks for your help ( =


Solution

  • Not sure what MFD is but assuming that X_test.MFD gives you an array of numbers you could use a mask to drop rows. A simple example of how to use a mask can be seen here:

    x = [[1,2,3,4,5],[6,7,8,9,10]]
    mfd = [0.6, 1.3]
    mask = x > 1
    x_new = x[mask,:]
    

    This would give:

    x = [1,2,3,4,5
         6,7,8,9,10]
    mask = [False, True]
    x_new = [6,7,8,9,10]