How to get a train data set with 7 rows from a main data set of 10 rows after creating a test data set with 3 random rows, I need to subtract the test data set from the main data set. However, I need to do this without using train_test_split().
data = pd.DataFrame({'job_title':np.random.choice(['data_science','Data_analysis'],10),
'level':np.random.choice(['entry','senior'],10),
'salary':np.random.choice((80000),10)})
data
job_title level salary
0 Data_analysis senior 33929
1 Data_analysis senior 45698
2 data_science senior 33607
3 Data_analysis senior 65818
4 Data_analysis senior 66095
5 Data_analysis entry 4718
6 data_science senior 74770
7 data_science entry 3707
8 data_science senior 26820
9 Data_analysis entry 23887
test_data = data.sample(3, random_state=17)
test_data
job_title level salary
7 Data_analysis senior 27174
2 data_science senior 58579
5 data_science senior 26554
I want to get a train_data frame that looks like as below
job_title level salary
0 Data_analysis senior 20003
1 data_science entry 28083
3 Data_analysis senior 12906
4 data_science senior 45588
6 Data_analysis senior 59851
8 Data_analysis senior 52008
9 data_science entry 32207
All you have to do is to obtain row indexes of the "test_data" Pandas DataFrame and drop the specified rows.
train_data=data.drop(test_data.index)