Search code examples
pythoncombinationscounterdata-analysis

How to count specific elements inside tuples after using Counter() - Python


I would kindly ask you to help me to clarify a doubt about Data Analysis with Python.

I am working on a DataFrame that displays sales data from a fictional company. In order to identify which pairs of products are most often sold together, I created a column named 'Grouped', which groups the products that were registered with the same 'Order ID' (each item is separated by a comma).

After that, I used Counter() from collections and combinations from itertools to count how many times each pair appeared in the 'Grouped' column. Please, check the following code:

from itertools import combinations
from collections import Counter

count_pairs=Counter()

for row in df['Grouped']:
    
    row_list=row.split(',')
    count_pairs.update(Counter(combinations(row_list,2))) # I want to count pairs of two
    
count_pairs.most_common(5)

This is the output that was generated:

[(('iPhone', 'Lightning Charging Cable'), 1004),
 (('Google Phone', 'USB-C Charging Cable'), 987),
 (('iPhone', 'Wired Headphones'), 447),
 (('Google Phone', 'Wired Headphones'), 414),
 (('Vareebadd Phone', 'USB-C Charging Cable'), 361)]

Now, I want to identify what are the products that appear the most in this list, that is, how many times each item appears in the available pairs.

I tried to analyze them individually with the following code:

['Lightning Charging Cable' in x for x in count_pairs].count(True) #output = 37

However, I want to create "a list or a series" and sort the results by the number of times each item appears in the available pairs.

Do you know a way to solve this?

Thank you very much for your help!

Additional info:

As some of you used the top-five pairs (count_pairs.most_common(5)) to solve this problem, I would like to inform that I will need to count the elements based on a 'Counter object', which is 'count_pairs':

count_pairs

The summarized output of 'count_pairs' is:

Counter({('Google Phone', 'Wired Headphones'): 414,
         ('Google Phone', 'USB-C Charging Cable'): 987,
         ('Bose SoundSport Headphones', 'Bose SoundSport Headphones'): 27,
         ('AAA Batteries (4-pack)', 'Google Phone'): 11,
         ('Lightning Charging Cable', 'USB-C Charging Cable'): 58,....}) 
#the original output has 313 pairs

Solution

  • Here's how I would do this:

    from collections import Counter
    import pandas as pd
    
    data = [(('iPhone', 'Lightning Charging Cable'), 1004),
     (('Google Phone', 'USB-C Charging Cable'), 987),
     (('iPhone', 'Wired Headphones'), 447),
     (('Google Phone', 'Wired Headphones'), 414),
     (('Vareebadd Phone', 'USB-C Charging Cable'), 361)]
    
    df = pd.DataFrame(sorted(Counter([x[0][1] for x in data]).items(), key= lambda x : x[1]))
    

    To apply the same transformation to the original count_pairs data structure, just remove the [0] from x[0][1], like this:

    df = pd.DataFrame(sorted(Counter([x[1] for x in count_pairs]).items(), key= lambda x : x[1]))
    

    The effective difference between the two structures is that data has an extra outer list that doesn't exist in count_pairs.