python algorithm statistics cluster-analysis data-science

Find Patterns of similar features / product combinations (preferably in python)

Let's say I have a csv file with the following structure (800k records) and I want to identify existing patters of product combinations (e.g. a pattern that Product XYZ are often brought together):

Customer_ID | Product_ID | Revenue
    1             A          X
    1             B          X
    1             C          X
    2             A          X
    2             D          X
    3             A          X
    4             F          X

How would you approach that from a data science perspective? Which methods would you use and which are the steps you need to take (e.g. pseudo code of the approach you would recommend, preferably in python).

Thank you so much for you help. It is highly appreciated! Regards Simon

Solution

There is a standard data mining task known as

Frequent itemset mining

aka market basket analysis.

It looks at products frequently bought together.

You really should read some basic books and Wikipedia first...