Search code examples
pythonpandasmatplotlibdata-sciencedata-visualization

Better way to show duplicates in Pandas


dups_df = df.pivot_table(columns=['DstAddr'], aggfunc='size')
print (dups_df )

I am using this code block to show the duplicates but I would like to see the output in order(most used one) and maybe with a better visualization. How can I do this?

enter image description here


Solution

  • You can use the duplicated method, as show above:

    print(df[df.duplicated(subset='DstAddr')]
    

    You can see the whole documentation at https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.duplicated.html

    Other way is value_counts method, as show above too:

    print(df.value_counts(subset='DstAddr', ascending=False))
    

    Documentation at https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.value_counts.html

    To vizualize this, I you can you value_counts and add a plot method.

    df.value_counts(subset='DstAddr', ascending=False).plot()
    

    Documentation at https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.plot.html