Search code examples
pythonexcelpandasxlsx

How to save duplicates only?


I made code to remove duplicates from col in my xlsx file.

import pandas as pd
from openpyxl.workbook import Workbook


def delete_duplicates(nazov_suboru, cielovy_subor,riadok):
    data = pd.read_excel(nazov_suboru)
    print("chvilelenku pockaj")
    data.drop_duplicates(subset=[riadok], keep=False, inplace=True)
    data.to_excel(cielovy_subor, index=False)
    print("done")

It save the unique data. But I need the opposite. To only save the duplicated ones. Cant figure it out. Any ideas please /


Solution

  • data = data[data.duplicated(subset=[riadok], keep=False)]
    

    would keep the duplicated rows.

    See pandas.DataFrame.duplicated