fraud_indices = np.array(dataset[dataset.Class == 1].index)
fraud_samples = dataset.iloc[fraud_indices, :]
I am using the above code on a dataset that has a column "Class" with 0s and 1s. What I am trying to do is simple, I am obtaining the indices in the dataset where the Class == 1 and I am using this to make a subset.
However I get the error "positional indexers are out-of-bounds" at the second line even though the indices are obtained from the same dataset. How can they be out of bounds? Can someone pls help out?
I think you need boolean indexing
only:
fraud_samples = dataset[dataset.Class == 1]
and if need indices:
fraud_indices = fraud_samples.index
However I get the error "positional indexers are out-of-bounds" at the second line even though the indices are obtained from the same dataset. How can they be out of bounds?
Reason is your indices are not default. So there are some values higher as length of DataFrame
and function iloc
selecting by positions, not by indices names like loc
.
Sample:
dataset = pd.DataFrame({'Class':[0,1,0,1]}, index=[0,1,3,5])
print (dataset)
Class
0 0
1 1
3 0
5 1
fraud_indices = np.array(dataset[dataset.Class == 1].index)
print (fraud_indices)
[1 5]
You cannot select 6.th row (python count from 0, so 5
), because does not exist with DataFrame.iloc
:
fraud_samples = dataset.iloc[fraud_indices, :]
print (fraud_samples)
IndexError: positional indexers are out-of-bounds
But if select by indices values by DataFrame.loc
:
fraud_samples = dataset.loc[fraud_indices, :]
print (fraud_samples)
Class
1 1
5 1