Search code examples
pythoncollectionsunique

how to find unique items in a total data set


I have a data set with around 60,000 rows. It is a purchase order where you do not have a unique ID. Sample data below.

36 40 41 42 43 45 46
38 39 48 50 51 57
41 59 62
63 66 67 68
74 75 76 77

In the above list each number is an item purchased. I need the following:

  1. Total unique items in the data set.
  2. Top 5 items which were most purchased.

Solution

  • This should do it:

    from collections import Counter
    
    items = Counter()
    with open('data_file.txt', 'r') as f:
        for line in f:
            items.update(line.split())
    
    print("Total Unique Items: {0}".format(len(items)))
    
    for item, count in items.most_common(5):
        print("Item {0} was purchased {1} times".format(item, count))
    

    Yes, it's that short :).