Search code examples
pythonpandasindexingduplicates

Index of duplicated values in pandas


I'm using pandas in python and using the duplicated(keep=False) I've made a check between the values of a same dataframe column. So I'm getting as result:

5063     True
5064     True
5065     True
5066    False
5067    False
        ...  
5310    False
5311    False
5312     True
5313    False
5314    False

Now I'd like to find the index of the true values. So I've thought that this could work

duplicateRowsDF = df.duplicated(keep=False)
for j in range (0, len(duplicateRowsDF)):
    attempt=str(duplicateRowsDF.iloc[j])
    if attempt=="True":
        duplicated_index=duplicateRowsDF.index

but I'm getting

Index([5063, 5064, 5065, 5066, 5067, 5068, 5069, 5070, 5071, 5072,
   ...
   5305, 5306, 5307, 5308, 5309, 5310, 5311, 5312, 5313, 5314],
  dtype='int64', length=252)

while I'm aiming to have an "array" made with 5063 5064 5065 5312. I've also tried the code df[df.index.duplicated(keep=False)] but it seems not providing me what I'm expecting.


Solution

  • Simply perform boolean indexing on the index:

    out = duplicateRowsDF.index[duplicateRowsDF]
    

    Or, without intermediate:

    out = df.index[df.duplicated(keep=False)]