Index out of bounds error even with .iloc in Pandas

fraud_indices = np.array(dataset[dataset.Class == 1].index)
fraud_samples = dataset.iloc[fraud_indices, :]

I am using the above code on a dataset that has a column "Class" with 0s and 1s. What I am trying to do is simple, I am obtaining the indices in the dataset where the Class == 1 and I am using this to make a subset.

However I get the error "positional indexers are out-of-bounds" at the second line even though the indices are obtained from the same dataset. How can they be out of bounds? Can someone pls help out?

Solution

I think you need boolean indexing only:

fraud_samples = dataset[dataset.Class == 1]

and if need indices:

fraud_indices = fraud_samples.index

However I get the error "positional indexers are out-of-bounds" at the second line even though the indices are obtained from the same dataset. How can they be out of bounds?

Reason is your indices are not default. So there are some values higher as length of DataFrame and function iloc selecting by positions, not by indices names like loc.

Sample:

dataset = pd.DataFrame({'Class':[0,1,0,1]}, index=[0,1,3,5])
print (dataset)
   Class
0      0
1      1
3      0
5      1

fraud_indices = np.array(dataset[dataset.Class == 1].index)
print (fraud_indices)
[1 5]

You cannot select 6.th row (python count from 0, so 5), because does not exist with DataFrame.iloc:

fraud_samples = dataset.iloc[fraud_indices, :]
print (fraud_samples)

IndexError: positional indexers are out-of-bounds

But if select by indices values by DataFrame.loc:

fraud_samples = dataset.loc[fraud_indices, :]
print (fraud_samples)
   Class
1      1
5      1