Search code examples
pythonpandasdataframesubtractiontrain-test-split

How to subtract sample data set from parent data set on base of index?


How to get a train data set with 7 rows from a main data set of 10 rows after creating a test data set with 3 random rows, I need to subtract the test data set from the main data set. However, I need to do this without using train_test_split().

data = pd.DataFrame({'job_title':np.random.choice(['data_science','Data_analysis'],10),
              'level':np.random.choice(['entry','senior'],10),
              'salary':np.random.choice((80000),10)})


data
        job_title   level   salary
0   Data_analysis   senior  33929
1   Data_analysis   senior  45698
2   data_science    senior  33607
3   Data_analysis   senior  65818
4   Data_analysis   senior  66095
5   Data_analysis   entry   4718
6   data_science    senior  74770
7   data_science    entry   3707
8   data_science    senior  26820
9   Data_analysis   entry   23887
test_data = data.sample(3, random_state=17)

test_data

        job_title   level   salary
7   Data_analysis   senior  27174
2   data_science    senior  58579
5   data_science    senior  26554

I want to get a train_data frame that looks like as below

        job_title   level             salary
0   Data_analysis   senior            20003
1   data_science    entry             28083
3   Data_analysis   senior            12906
4   data_science    senior            45588
6   Data_analysis   senior            59851
8   Data_analysis   senior            52008
9   data_science    entry             32207

Solution

  • All you have to do is to obtain row indexes of the "test_data" Pandas DataFrame and drop the specified rows.

    train_data=data.drop(test_data.index)